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Introduction 


Inverse  planning  is  at  the  heart  of  prostate  Volumetric  Modulated  Arc  Therapy  (VMAT)  treatment  procedure 
and  critically  determines  its  level  of  success.  As  practiced  now,  the  capacity  of  VMAT  is  greatly  underutilized 
because  of  inferior  computing  performance  of  existing  optimization  methods.  An  alternative  mathematical 
approach  that  improves  both  the  efficiency  and  the  efficacy  is  needed  and  is  the  center  of  this  research. 
We  propose  to  develop  a  new  innovative  inverse  planning  tool,  based  on  the  novel  idea  of  superiorization, 
to  replace  the  classical  constrained  optimization  approaches  employed  in  clinics  today  for  prostate  VMAT 
cases. 

Towards  this  goal,  year  1  of  the  training  award  focused  on  formulating  the  VMAT  problem  as  a  con¬ 
strained  superiorization  problem  and  on  the  development  of  a  framework  of  fast  converging  inverse  planning 
algorithms.  The  new  approach  was  then  implemented,  tested  and  evaluated  on  a  previously  treated  prostate 
cancer  case  where  initial  results  were  obtained.  In  year  2,  the  work  concentrated  on  developing  further  the 
modality  assumed  for  superiorization  when  applied  to  the  inverse  planning  in  radiation  therapy.  Further,  the 
work  was  implemented  and  compared  with  the  previous  developed  method.  Towards  the  overarching  goal 
of  the  award  we  expanded  the  superiorization  framework  to  other  applications,  such  as  proton  imaging  and 
therapy,  to  help  with  the  same  kind  of  computational  relief  that  is  needed  in  these  types  of  applications. 


Body 

1.  Research  Accomplishments 

SOW  Aim  1:  Develop  algorithms  for  inverse  planning  using  superiorization  techniques 
for  prostate  VMAT 


Meeting  the  goal  outlined  in  the  SOW  aim  1,  we  have  studied  the  problem  of  inverse  planning  for  prostate 
VMAT  and  developed  a  framework  of  algorithms  using  the  superiorization  methodology  that  is  specifically 
tailored  to  this  application.  We  first  defined  the  problem  mathematically  by  reformulating  it  as  a  linear 
feasibility  problem  (instead  of  a  minimization  problem)  and  suggested  a  solution  to  solve  it  using  the  su¬ 
periorization  methodology.  In  developing  the  tools,  we  have  also  generalized  the  approach  to  include  other 
medical  physics  applications,  and  provided  conditions  that  are  simple  to  meet  both  in  theory  and  in  prac¬ 
tice.  Our  claims  were  proved  mathematically  and  the  results  were  submitted  to  three  journal  (archival) 
publications  [2,  4,  11], 


Task  1:  Formulating  the  VMAT  treatment  planning  as  a  constrained  superiorization  problem 


Our  approach  to  a  VMAT  treatment  planning  started  by  studying  the  current  mathematical  models  used 
for  this  application.  Since  the  superiorization  methodology  requires  a  different  mathematical  formulation, 
the  first  step  was  to  model  the  problem  accordingly. 

Consider  the  system  of  equations 

Ax  =  d,  (1) 


where  A  is  the  J  x  I  dose  matrix  that  maps  any  intensity  of  beamlets  vector  x  =  (xi)\=1  £  R1  onto  a  dose  in 
voxels  vector  d  =  (dJ)^_1  £  R:J .  Here  /  is  the  total  number  of  beamlets  and  J  is  the  total  number  of  voxels. 
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The  minimization  problem  can  then  be  formulated  as 


minimize  X^f=i  \\Asx  —  d(s) 
subject  to  x  >  0, 


(2) 


where  the  index  s  stands  for  different  structures,  As  is  the  submatrix  of  A  related  to  structure  s  and  is 
the  subvector  of  d  related  to  structure  s,  respectively,  and  As  is  the  importance  factor  associated  with  the 
sth  structure  which  is  decided  by  the  planner.  There  is  an  assumption  that  x  is  achievable  using  apertures 
(aperture  constraints). 

Assume  that  we  have  S  structures,  for  s  =  1,  2, . . . ,  S,  (including  the  complement  of  all  identified  struc¬ 
tures),  and  denote  by  Os  the  set  of  indices  of  voxels  that  belong  to  the  sth  structure,  such  that 


Os  {js,li  js,2,  •  •  • 


(3) 


where  ?n(s)  is  the  number  of  voxels  in  this  structure.  Then  the  system  matrix  A  can  be  partitioned  into 
blocks 

r  a,  i 


A  = 


■A-2 


(4) 


so  that  a  submatrix  As  will  contain  the  rows  from  A  whose  indices  appear  in  Oa,  (similarly,  let  d(s)  be  the 
subvector  of  d  whose  component  indices  appear  in  Os )  and  then  the  system  becomes 


'  A,  - 

(  dA)  \ 

A.2 

X  = 

d(2) 

As 

\  d(S)  / 

An  optimization  method  aims  at  satisfying  the  system  (1)  (equivalently  (5))  while  minimizing  a  given  ob¬ 
jective  function. 


Reformulating  the  problem  as  a  constrained  superiorization  problem:  We  suggest  the  follow¬ 
ing  modifications  to  the  above  modality.  Replace  the  prescription  method  that  gives  rise  to  the  system 
Ax  =  d  in  (1)  by  a  more  flexible  one  in  which  we  ask  the  planner  to  provide  lower-  and  upper-  dose  bounds 
vectors,  d  and  d,  respectively,  on  all  voxels  in  all  structures,  and  instead  of  (2)  we  aim  at  solving  the  following 
linear  feasibility  problem 

d  <  Ax  <  d.  (6) 

By  transforming  the  problem  of  (1)  into  a  linear  feasibility  problem  of  the  form  (6),  we  allow  many  iterative 
projection  method  to  derive  a  solution.  This  enables  a  formulation  for  the  superiorization  methodology  to  be 
applied  to  VMAT  inverse  planning  problem  since  many  of  these  algorithms  are  also  perturbation  resilient. 
Specifically,  methods  that  belong  to  the  two  classes  of  projection  methods,  String  Averaging  Projection 
(SAP)  and  Block-Iterative  Projection  (BIP)  methods,  can  be  applied  towards  solving  this  formulation  and 
achieve  finding  a  superior  solution  in  addition  to  satisfying  the  feasibility  constraints  (see  [1,  3]).  That  is,  an 
x  obtainable  by  a  projection  method  alone  will  be  an  intensity  of  beamlets  vector  trying  to  solve  (6),  while 
using  a  projection  method  that  is  also  perturbation  resilient  allows  for  obtaining  an  x  that  solves  (6)  but 
also  provides  a  solution  that  is  superior  with  respect  to  an  objective  function. 
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The  solution  vector  x  of  the  beamlet  intensities  that  results  from  the  superiorization  approach  will  then 
be  evaluated.  Tools  such  as  dose  volume  histograms  (DVHs)  will  help  assess  conformality  to  the  prostate 
(the  target)  and  to  the  organs  at  risk  (OAR).  These  will  be  compared  against  what  is  recommended  by 
a  physician  in  the  clinic  and  governed  by  the  specifications  of  the  Radiation  Treatment  Oncology  Group 
(RTOG)  protocol  for  prostate  cancer  patients  [5]. 

The  adaptation  to  our  model  based  on  the  RTOG  protocol  is  as  follows:  Given  a  structure  s  that  is  an 
OAR,  we  define  to  be  the  upper-bound  subvector  of  the  prescribed  dose 

d(s)  =  d(B),  (7) 

and  define  d^s)  to  be  a  lower-bound  subvector  for  any  target  structure  s 

d(s)  —  d(s) .  (8) 

This  allows  the  constraints  in  (6)  to  be  written  as 

0  <  Asx  <  d(a),  (9) 

for  an  OAR  structure  s  and  as 

d(a)  <  Asx  <  e(s),  (10) 

for  a  target  structure  s,  where  e(s)  is  a  clinically-specified  upper-bound  subvector  for  the  target. 

In  assessing  the  solution  provided  by  the  superiorization  method,  if  the  acceptance  criteria  is  not  met, 
then  a  refined  selection  of  d  and  d  will  be  provided  and  the  process  will  repeat  until  a  superior  feasible 
solution  is  found  (this  step  is  identical  to  how  it  is  done  in  the  clinic  today). 


Task  2:  Development  of  a  framework  for  fast  converging  inverse  planning  superiorization  techniques 

And 

Task  7:  Investigate  the  underlying  principles  and  put  their  concept  on  a  firm  mathematical  ground 

In  developing  a  framework  for  fast  converging  inverse  planning  superiorization  techniques  we  first  identified 
several  problems  that  currently  exist  in  optimization  methods.  In  classical  optimization  it  is  assumed  that 
there  is  a  constraints  set  C  and  the  task  is  to  find  an  x  £  C  for  which  <j>(x)  is  minimal.  Problems  with 
this  approach  are  the  following:  (1)  The  constraints  may  not  be  consistent  and  so  C  could  be  empty  and 
the  optimization  task  as  stated  would  not  have  a  solution.  (2)  Iterative  methods  of  classical  constrained 
optimization  typically  converge  to  a  solution  only  in  the  limit.  In  practice  some  stopping  rule  is  applied  to 
terminate  the  process  and  the  actual  output  at  that  time  may  not  be  in  C  and,  even  if  it  is  in  C,  it  is  most 
unlikely  to  be  a  minimizer  of  <f>  over  C . 

Both  problems  were  addressed  in  the  newly  developed  superiorization  framework.  Mathematical  defini¬ 
tions  and  conditions  were  introduced  and  were  theoretically  proven.  The  new  foundations  include  two  new 
notions  of  constraints- compatibility  and  strong  perturbation  resiliency.  The  new  concepts  allow  to  take  into 
the  modality  the  infeasibility  and  practical  convergence  problems  that  exist  in  optimization  methods.  More 
specifically,  in  the  superiorization  model  we  suggested  to  replace  the  constraints  set  C  by  a  nonnegative 
real-valued  function  Pr  that  serves  as  an  indicator  of  how  incompatible  a  given  x  is  with  the  constraints. 
Then  the  merit  of  an  actual  output  of  an  algorithm  is  given  by  the  smallness  of  the  two  numbers  Pr(x)  and 
4>{x).  Roughly,  if  an  iterative  algorithm  produces  an  output  x,  then  its  superiorized  version  will  produce  an 
output  x'  for  which  Pr(x')  is  not  larger  then  Pr{x),  but  (in  general)  cj>( x')  is  much  smaller  than  <t>(x). 
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In  addition  to  the  theoretical  developments  of  superiorization,  a  practical  and  systematic  way  was  devel¬ 
oped  to  turn  any  iterative  algorithm  that  solves  a  feasibility  problem  into  an  algorithm  that  does  superior¬ 
ization.  For  an  iterative  algorithm  P  and  for  any  optimization  criterion  (ft  for  which  we  know  how  to  produce 
nonascending  vectors  (see  definition  p.  5536  in  [4]),  the  following  pseudocode  automatically  takes  P  and  pro¬ 
duces  a  version  of  P  that  is  superiorized  for  <f>  (exact  details  of  the  procedure  can  be  found  on  page  5537  in  [4]): 

Superiorized  Version  of  the  Basic  Algorithm 

1.  set  k  =  0 

2.  set  yk  =  y° 

3.  set  t  =  —  1 

4.  repeat 

5.  set  n  =  0 

6.  set  yk,n  =  yk 

7.  while  n<N 

8.  set  vk’n  to  be  a  nonascending  vector  for  <f>  at  yk,n 

9.  set  loop=true 

10.  while  loop 

11.  set  (.  =  (.  +  1 

12.  set  (3k,n  =  Ve 

13.  set  2  =  yk’n  +  Pk,nVk’n 

14.  if  <j>  (z)<<j>  (yk)  then 

15.  set  n—n  +  1 

16.  set  yk’n=z 

17.  set  loop  =  false 

18.  set  yk+1=Ac  ( yk,N ) 

19.  set  k  =  k  +  1 

By  bridging  the  gap  that  typically  exist  between  theory  and  practice  in  the  new  model,  superiorization  was 
made  more  general.  That  is,  the  framework  fit  many  other  medical  physics  application,  not  just  VMAT 
or  radiation  therapy  inverse  planning  type  applications.  All  the  results  mentioned  briefly  here  have  been 
published  in  an  archival  journal  publication  in  the  journal  of  Medical  Physics  [4],  see  the  Appendix  Section 
for  the  full  manuscript. 

Another  accomplishment  related  to  this  task  touches  on  an  additional  aspect  of  superiorization.  Con¬ 
strained  optimization  problems  that  arise  in  real-life  applications  are  often  huge  (such  an  example  is  the 
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T otal  V ariation  value 

Time  (seconds) 

projected  subgradient  method 

919 

2217 

superiorization  method 

873 

102 

Table  1:  Performance  comparison  of  the  projected  subgradient  method  and  the  superiorization  method  with 
Total  Variation  as  the  objective  function. 


VMAT  problem).  It  can  then  happen  that  the  traditional  algorithms  for  constrained  optimization  require 
computational  resources  that  are  not  easily  available  and,  even  if  they  are,  the  length  of  time  needed  to 
produce  an  acceptable  output  is  too  long  to  be  practicable.  As  part  of  our  goal  to  show  that  superioriza¬ 
tion  can  handle  large  size  problems  efficiently,  we  have  illustrated  that  the  computational  requirements  of  a 
superiorized  algorithm  can  be  significantly  less  than  that  of  a  traditional  optimization  algorithm,  by  report¬ 
ing  on  a  comparison  of  superiorization  with  the  projected  subgradient  method  (PSM),  which  is  a  standard 
method  of  classical  optimization.  Table  1  summarizes  the  comparison  we  have  performed  between  the  PSM 
method  and  the  superiorization  method.  In  our  experiment,  we  set  the  the  stopping  rule  to  guarantee  that 
the  output  of  the  superiorization  method  is  at  least  as  constraints-compatible  as  the  output  of  the  PSM. 
The  superiorization  method  showed  clearly  superior  efficacy  to  the  PSM:  it  obtained  a  result  with  a  lower 
objective  function  value  (TV)  at  less  than  one  twentieth  of  the  computational  cost. 

The  complete  report  that  summarizes  this  work  was  published  in  the  Journal  of  Optimization  Theory 
and  Applications  [2].  It  is  attached  to  this  report  in  the  Appendix  Section. 

Task  3:  Implementation  and  testing  of  the  developed  algorithms 

And 

Task  5:  Early-stage  algorithm  testing  on  a  prostate  cases 

And 

Task  6:  Testing  on  clinical  data 

In  these  tasks  we  wanted  to  assess  our  proposed  approach  to  using  superiorization  on  a  realistically  yet 
simple  test  case.  The  goal  set  here  is  two-fold:  the  first  is  to  show  that  the  developed  method  can  produce 
good  results  and  the  second  is  to  obtain  a  clear  indication  if  the  nonacsending-type  superiorization  tech¬ 
niques  should  be  replaced  with  alternative  derivative-free  approaches  (see,  SOW  Task  4:  the  development  of 
alternative  derivative-free  techniques  to  superiorization). 

We  proposed  to  use  as  a  test  case  in  this  task  a  previously  treated  intensity  modulated  radiation  therapy 
(IMRT)  prostate  patient  case.  As  was  explained  in  the  research  proposal,  the  VMAT  technique  delivers 
an  IMRT  type  treatment  in  a  single  arc.  Getting  good  results  on  a  previously  treated  IMRT  case  would 
establish  a  level  of  confidence  that  the  superiorization  method  can  deliver  superior  results  by  referencing  a 
previously  treated  clinical  case.  The  modality  that  was  given  above  (in  Task  1)  is  identical  for  these  two 
radiation  therapy  techniques  (i.e.,  IMRT  and  VMAT);  the  difference  lies  in  the  size  of  the  problem  and 
its  level  of  complexity.  Since  superiorization  was  never  tried  with  any  type  of  radiation  therapy  treatment 
planning  it  is  important  to  provide  such  evidence  on  an  actual  clinical  case.  The  mathematical  model  which 
we  have  developed  along  with  the  theories  and  proofs  of  the  superiorization  methodology  (in  Task  2)  fit  both 
problems.  Satisfactory  results  will  encourage  us  to  continue  develop  the  method  as  it  is  proposed  in  Tasks 
1  and  2  and  tailor  it  further  more  to  the  VMAT  approach. 

Algorithmic  operator  and  objective  function  The  framework  that  was  developed  is  quite  general  for 
many  medical  physics  applications.  With  the  modality  of  the  superiorization  approach  in  (6),  a  choice  for 


<«*.*>  “«l 


Figure  1:  Geometrical  description  of  ART  with  inequalities  constraints 


a  projection  operator  that  is  perturbation  resilient  is  needed  as  well  as  a  choice  of  an  objective  function. 
The  algorithmic  operator  that  was  chosen  for  our  implementation  was  the  Algebraic  Reconstruction  Tech¬ 
nique  (ART)  for  inequalities  constraints.  This  operator  was  proven  to  be  perturbation-resilient  in  [1].  The 
constraints  of  the  system  in  (6)  can  be  thought  of  as  hyperslabs.  The  algorithm  projects  the  current  point 
according  to  its  location  in  relation  to  the  two  hyperplanes  that  form  a  hyperslab.  A  geometrical  descrip¬ 
tion  to  this  feasibility  problem  is  provided  in  Figure  1.  The  analytical  formulation  associated  with  it  is  the 
following: 

!xk,  if  Ci  <  ( a\xk )  <  di  (case  A), 

xk  +  Xkd '  „<y  ^ a1,  if  di<(a\xk)  (case  B),  (11) 

xk  +  A fcC*  ||^| ^  a1,  if  (a\xk)  <  Ci  (case  C). 

The  objective  function  used  in  our  implementation  was  the  total  variation  (TV)  functional  of  the  beamlet 
intensity  vector  x,  see  Eq.  (12)  in  [4]  and  the  discussion  in  the  research  proposal  under  Specific  Aim  1 
regarding  this  choice.  We  denote  herein  the  superiorization  algorithm  that  uses  TV  as  the  objective  function 
by  TV-Superiorization. 

Prostate  patient  data  and  planning  The  data  for  testing  the  approach  were  of  a  previously  treated 
prostate  cancer  patient.  A  seven  field  radiation  treatment  IMRT  plan  was  created.  The  organs  that  were 
included  in  the  plan  were  the  prostate  (target),  rectum,  bladder,  small  bowel  (OARs)  and  the  full  body. 
Figure  2  shows  the  CT  and  the  contours  of  these  structures.  Using  RTOG  0815  [5]  we  set  in  Table  2  the 
acceptance  criteria  for  the  implemented  TV-Superiorization  algorithm. 
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Figure  2:  CT  of  the  prostate  patient  case  used  in  the  experiment. 


Organ 

Target? 

Acceptance  criteria 

I.  Prostate 

Yes 

1.  Dose  will  be  normalized  s.t.  98%  of  the  PTV  receives  the 
prescription  dose.  (Prescribed  dose  to  PTV  is  79.2  Gy.) 

2.  The  maximum  allowable  dose  within  the  PTV  is  107% 

of  the  prescribed  dose  (i.e. ,  maximum  allowed  dose  is  84.744  Gy). 

3.  The  minimum  allowable  dose  within  the  PTV  should  be  >95% 
of  the  prescribed  dose  (i.e.,  100%  of  the  dose  should  be  >75.24  Gy. 

II.  Rectum 

No 

1.  No  more  than  15%  volume  receives  dose  that  exceeds  75  Gy 

2.  No  more  than  25%  volume  receives  dose  that  exceeds  70  Gy 

3.  No  more  than  35%  volume  receives  dose  that  exceeds  65  Gy 

4.  No  more  than  50%  volume  receives  dose  that  exceeds  60  Gy 

III.  Bladder 

No 

1.  No  more  than  15%  volume  receives  dose  that  exceeds  80  Gy 

2.  No  more  than  25%  volume  receives  dose  that  exceeds  75  Gy 

3.  No  more  than  35%  volume  receives  dose  that  exceeds  70  Gy 

4.  No  more  than  50%  volume  receives  dose  that  exceeds  65  Gy 

IV.  Small  Bowel 

No 

1.  Upper  bound  is  set  to  52  Gy. 

Table  2:  Acceptance  criteria  for  prostate  patients  according  to  RTOG  0815  [5]. 
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Figure  3:  DVH  plots  for  a  prostate  case  experiment  with  and  without  TV-Superiorization. 


Results  We  compared  the  results  when  superiorization  was  present  versus  when  it  was  not.  The  TV- 
Superiorization  algorithm  was  able  to  meet  the  RTOG  acceptance  criteria  while  the  one  without  TV- 
Superiorization  was  not.  Moreover,  the  TV-Superiorization  algorithm  was  able  to  achieve  this  in  a  relatively 
short  amount  time  of  only  12  iterations.  Figure  3  shows  the  DVH  curves  of  the  two  algorithms  side-by-side. 
The  solid  lines  represent  the  TV-Superiorization  algorithm  and  the  dashed  lines  represent  the  algorithm 
without  Superiorization.  The  corresponding  numbers  for  assessing  the  acceptance  criteria  are  specified  in 
Table  3.  As  can  be  seen,  the  criteria  that  is  based  on  the  RTOG  protocol  [5]  was  fully  met  by  the  superior¬ 
ization  method  for  this  prostate  case. 


Task  4:  Alternative  approach 

The  goal  of  this  task  was  modified  to  reflect  the  success  of  tasks  3  and  5.  Instead,  we  developed  a  different 
modality  to  be  tested  against  the  original  one  proposed  in  Task  1,  but  also  one  that  is  directly  inherited 
from  the  mathematical  foundation  laid  out  in  task  7.  In  the  new  proposed  approach  we  aimed  at  removing 
the  linear  two-sided  inequalities  feasibility  model  in  favor  of  a  least-squares  model  approach. 

Quadratic  Programming  Superiorization  (QPS) 

Consider  the  system  of  equations  as  in  (1)  above,  i.e. ,  Ax  =  d ,  where  A  is  the  dose  matrix  mapping 
the  intensity  of  beamlets  vector  x  to  the  dose  in  voxels  vector  d ,  where  the  total  number  of  beamlets  is  / 
the  total  number  of  voxels  is  J.  Further,  in  this  new  work  we  assume  the  notation  and  fundamentals  that 
were  used  in  equation  (l)-(5)  (not  copied  here).  In  the  new  formulation  we  propose  the  following  changes: 
We  suggest  to  use  the  famous  least-squares  model  of  (2)  and  not  our  previously  suggested  model  of  linear 
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Organ 

Target? 

Criterion 

TV-Superiorization 

I.  Prostate 

Yes 

%vol  >  79.2  Gy  =  98 

%vol  >  79.2  Gy  =  98 

%vol  >  84.744  Gy  =  0 

%vol  >  84.6  Gy  =  0 

%vol  >  75.24  Gy  =  100 

%vol  >  75.24  Gy  =  100 

II.  Rectum 

No 

%vol  >  75  Gy  <  15 

%vol  >  75  Gy  <  12.7 

%vol  >  70  Gy  <  25 

%vol  >  70  Gy  <  18.6 

%vol  >  65  Gy  <  35 

%vol  >  65  Gy  <  25.8 

%vol  >  60  Gy  <  50 

%vol  >  60  Gy  <  34.5 

III.  Bladder 

No 

%vol  >  80  Gy  <  15 

%vol  >  80  Gy  <  2.2 

%vol  >  75  Gy  <  25 

%vol  >  75  Gy  <  4.9 

%vol  >  70  Gy  <  35 

%vol  >  70  Gy  <  6.8 

%vol  >  65  Gy  <  50 

%vol  >  65  Gy  <  8.7 

IV.  Small  Bowel 

No 

%vol  >  52  Gy  <  0 

%vol  >  1.4  Gy  <  0 

Table  3:  Results  of  the  criteria  for  the  TV-Superiorization  algorithm. 


interval  inequalities  above.  It  is  also  typical  to  include  a  second  term  that  will  be  minimized  with  the 
original  objective  function.  Such  a  term  (called  a  regularization  term)  carry  the  means  for  incorporating 
total  variation,  i.e., 

minimize  ^s=i  As  —  d(s)  ||  +/3TV(x) 
subject  to  i>0, 

In  this  work  we  will  not  regularize  the  objective  function  but  follow  the  superiorization  framework  as  laid 
out  in  the  pseudocode  of  the  Superiorized  Version  of  the  Basic  Algorithm  (SVoBA).  Instead  we  propose  to 
perform  TV  superiorization  on  top  of  a  quadratic  programming  (QP)  algorithm  that  is  intended  for  the 
least-squares  model.  In  this  way  <f>  will  be  the  TV  function.  Further,  the  “Basic  Algorithm”  in  line  18  of 
the  SVoBA  will  be  the  QP  algorithm  and  not  the  feasibility-seeking  algorithm  that  was  used  in  the  previous 
section.  We  next  describe  in  detail  the  QP  algorithm. 

The  QP  algorithm 

The  QP  algorithm  was  originally  designed  to  solve  (2)  iteratively.  In  our  framework,  whenever  line  18 
of  the  SVoBA  is  called  for  the  algorithmic  operator  Ac  we  design  it  such  that  it  will  be  an  iteration  of 
the  QP  algorithm.  There  are  many  QP  algorithms  for  solving  (2).  We  decided  to  implement  the  Projected 
Landweber  method  [6,  7]  since  it  fits  the  superiorization  framework  (more  details  below).  Let  us  denote  the 
quadratic  objective  function  of  (2)  by 


F{x)  =  As  ||Asx-  d(s)||"  .  (13) 

'  S=1 

The  gradient  of  F  can  then  be  calculated 

s 

WF(x)  =  A SATS  ( Asx  -  d(s))  (14) 

S  =  1 

where  Aj  is  the  transpose  matrix  of  the  submatrix  As.  In  our  pseudocode  of  SVoBA,  when  line  18  is  reached, 
in  the  algorithm  yk’N  should  treat  it  as  the  xk  in  the  Landweber  Algorithm  and  calculate  from  it  the  next 
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The  projected  Landweber  method 
Initialization:  a:0  £  R1  is  arbitrary, 

Iterative  step:  Given  the  current  iterate  xk  calculate  the  next  iterate  xk+1  by 

xk+1=Pn(xk-Tk\7F{xk))  (15) 

where  Cl  =  {x  £  R1  \  Xi  >  0  for  all*  =  1, 2, . . . ,  1}  is  the  nonnegative  orthant  of  R1  and  Pq  is  the  projection 
onto  it,  namely,  for  any  point  z  £  R1 

=  max(zi,0)  =  |  J  J  []’  (16) 


The  stepsizes:  The  stepsizes  rk  should  be  chosen  to  be  either  diminishing  steps  rk 
summable  steps  =  J, . 


^7=  or  square 


Organ 

Target? 

Criterion 

QP-Superiorization 

I.  Prostate 

Yes 

%vol  >  78  Gy  =  95 

%vol  >  78  Gy  =  95 

%vol  >  84.744  Gy  =  0 

%vol  >  84.70  Gy  =  0 

%vol  >  75.24  Gy  =  100 

%vol  >  76.02  Gy  =  100 

II.  Rectum 

No 

%vol  >  75  Gy  <  15 

%vol  >  75  Gy  <  2.4 

%vol  >  70  Gy  <  25 

%vol  >  70  Gy  <  3.1 

%vol  >  65  Gy  <  35 

%vol  >  65  Gy  <  3.6 

%vol  >  60  Gy  <  50 

%vol  >  60  Gy  <6.4 

III.  Bladder 

No 

%vol  >  80  Gy  <  15 

%vol  >  80  Gy  <  0 

%vol  >  75  Gy  <  25 

%vol  >  75  Gy  <  0.8 

%vol  >  70  Gy  <  35 

%vol  >  70  Gy  <  2.3 

%vol  >  65  Gy  <  50 

%vol  >  65  Gy  <  4.7 

IV.  Small  Bowel 

No 

%vol  >  52  Gy  <  0 

%vol  >  1.2  Gy  <  0 

Table  4:  Results  of  the  criteria  for  the  QP-Superiorization  algorithm. 


iterate  according  to  (15),  which,  in  turn,  will  be  the  yk+1  of  line  18  of  the  SVoBA.  The  Projected  Landweber 
Method  was  proven  to  be  perturbation  resilient  in  [8] . 

Results: 

We  report  here  on  the  results  obtained  for  the  same  data  of  a  prostate  patient  (which  we  reported  earlier 
using  the  TV-Superiorization  method).  As  can  be  seen  from  the  Dose  Volume  Histogram  and  from  the  Table, 
the  algorithm  was  able  to  produce  acceptable  results  and  meet  all  the  criteria.  The  number  of  iterations 
that  it  needed  to  reach  this  result  was  20. 

Additional  tasks  that  were  completed  by  the  PI  during  the  duration  of  the  award  (not 
included  in  the  SOW): 

•  Implementation  of  the  algorithms  and  code  and  its  availability  to  the  community:  The  mathematical 
foundation  behind  superiorization  will  be  available  to  the  community  in  the  package  framework  of 
SNARK09  in  the  context  of  image  reconstruction  from  projections.  A  paper  that  summarizes  the  work 
was  published  in  the  journal  of  Computer  Methods  and  programs  in  Biomedicine,  see  [9]  for  further 
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Figure  4:  DVH  plots  for  the  newly  proposed  Quadratic  Programming  Superiorization  (QPS). 
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details  (also  attached  to  the  appendix). 

•  The  PI  participated  in  a  collaborative  effort  to  incorporate  the  method  of  superiorization  into  prostate 
proton  therapy  and  proton  CT  imaging  techniques.  The  paper  entitled:  “200  MeV  Proton  Radiography 
Studies  with  a  Hand  Phantom  Using  a  Prototype  Proton  CT  Scanner,  IEEE  Trans  Med  Imaging  2013”, 
provides  the  reasoning  behind  proton  CT  and  its  usefulness  for  using  protons  to  treat  cancer  as  oppose 
to  photons.  The  acknowledgment  section  specifies  the  contribution  of  the  PI  and  acknowledgment  of 
the  grant,  [10]. 

•  The  PI  participated  in  the  development  and  planning  of  an  Intensity  Modulated  Proton  Therapy 
(IMpRT)  inverse  planning  method  which  incorporates  principles  of  the  work  developed  during  this 
award.  The  attached  IMpRT  proposal  summarizes  future  aspect  of  this  collaboration  and  continuation 
effort  among  multiple  institutes  and  how  this  may  enrich  the  career  path  of  the  PI. 

Training  Accomplishments 

Task  8:  Seminar,  lectures  and  meetings 
Task  9:  Research  training 
Task  10:  Clinical  training 

During  the  duration  of  the  training  award  the  PI  had  attended  regular  meetings,  seminars  and  journal 
clubs  with  presentations  on  topics  related  to  radiation  therapy  treatment  planning.  Other  presentations  of 
visiting  scholars  and  professionals  were  also  available  throughout  the  year  and  had  enriched  his  knowledge 
on  the  topic.  The  PI  was  trained  in  the  clinic  to  operate  the  Eclipse  and  Aria  system  stations  for  treatment 
planning  available  at  Stanford  Cancer  Center  (Eclipse  and  Aria  are  commercial  tools  for  treatment  planning 
developed  by  Varian  Medical  Systems).  He  collaborated  with  radiation  oncologists,  radiation  therapists, 
physicists  and  dosimetrists  and  obtained  first-hand  the  knowledge  and  experience  of  the  process  of  prostate 
radiation  treatment  planning. 


Key  Research  Accomplishments 

•  Formulated  the  VMAT  treatment  planning  as  a  constrained  superiorization  problem. 

•  Developed  a  framework  for  fast  converging  inverse  planning  superiorization  techniques. 

•  Derived  the  necessary  conditions  of  the  superiorization  framework  for  VMAT  treatment  planning 

•  Placed  the  newly  developed  concepts  on  a  firm  mathematical  ground. 

•  Implemented  and  tested  the  new  superiorization  framework  and  showed  good  initial  results. 

•  Developed  implemented  and  tested  a  new  modality  based  on  Quadratic  Programing  and  incorporated 
it  into  the  superiorization  framework. 

•  Participated  in  additional  collaborations  for  using  the  developed  superiorization  framework  in  other 
applications  including:  image  reconstruction,  proton  CT  and  proton  radiation  therapy. 

•  Trained  in  treating  prostate  cancer  as  it  is  done  in  the  clinic  today. 

•  Invited  speaker  to  international  meetings  and  workshops. 
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Reportable  Outcomes 

•  Four  journal  publications  were  submitted.  Three  appeared  and  the  fourth  was  accepted: 

1.  G.T.  Herman,  E.  Garduno,  R.  Davidi  and  Y.  Censor,  Superiorization:  An  optimization  heuristic 
for  medical  physics,  Medical  Physics  39  (2012),  5532-5546. 

2.  Y.  Censor,  R.  Davidi,  G.T.  Herman,  R.W.  Schulte  and  L.  Tetruashvili,  Projected  subgradi¬ 
ent  minimization  versus  superiorization,  Journal  of  Optimization  Theory  and  Applications,  DOI 
10.1007/sl0957-013-0408-3,  2013. 

3.  J.  Klukowska,  R.  Davidi,  and  G.T.  Herman:  SNARK09  -  A  software  package  for  the  reconstruction 
of  2D  images  from  ID  projections,  Computer  Methods  and  Programs  in  Biomedicine,  110:424-440, 
2013. 

4.  R.  Davidi,  Y.  Censor,  R.W.  Schulte,  S.  Geneser,  and  L.  Xing.  Feasibility-Seeking  and  Superior¬ 
ization  Algorithms  Applied  to  Inverse  Treatment  Planning  in  Radiation  Therapy,  Contemporary 
Mathematics,  (to  appear),  2014. 

•  The  above  work  has  been  accepted  for  presentation  at  the  joint  workshop  sponsored  by  the  American 
Society  for  Therapeutic  Radiology  and  Oncology  (ASTRO),  the  National  Cancer  Institute  (NCI)  and 
the  American  Association  of  Physicists  in  Medicine  (AAPM)  ,  June  13-14,  2013,  National  Institutes  of 
Health,  Bethesda,  MD,  USA. 

•  The  above  work  has  been  accepted  for  presentation  at  the  workshop  on  Projection  Methods:  Theory 
and  Practice,  June  19-21,  2013,  Fraunhofer  Institute  for  Industrial  Mathematics  ITWM,  Kaiserslautern, 
Germany. 

•  The  above  work  has  been  accepted  for  presentation  at  the  meeting  on  Projection  Methods  in  Feasi¬ 
bility,  Superiorization  and  Optimization,  December  19,  2013,  Center  for  Mathematics  and  Scientific 
Computation  (CMSC)  and  the  Caesarea  Rothschild  Institute  (CRI)  for  Interdisciplinary  Applications 
of  Computer  Science  at  the  University  of  Haifa,  Mt.  Carmel,  Israel. 


Conclusion 

We  were  able  to  extend  the  superiorization  methodology  into  a  larger  framework,  one  that  is  more  realistic 
from  the  point  of  view  of  the  application  at  hand.  By  taking  into  account  the  discrepancy  that  exist  between 
theory  and  practice  and  incorporate  it  into  our  model,  we  minimized  potential  issues  that  typically  appear 
when  a  theory  is  applied  to  a  real-life  application. 

Superiorization  was  developed  to  be  a  general  tool  for  medical  physics  applications.  It  is  capable  of 
turning  any  iterative  algorithm  that  tries  to  satisfy  a  set  of  constraints  into  one  that  is  also  capable  of 
superiorizing  an  objective  function.  The  work  that  came  out  of  this  research  can  help  other  applications 
that  use  optimization  methods  as  the  main  tool.  Examples  include  X-ray  CT  using  image  reconstruction 
from  projections,  proton  CT  that  uses  superiorization  as  the  iterative  engine  to  superiorize  an  objective 
function,  and  utilizing  its  eficacy  for  implementing  fast  converging  techniques  in  proton  radiation  therapy. 

Using  the  developed  methodology,  we  tailored  it  specifically  to  solve  the  problem  of  IMRT  and  VMAT  in 
radiation  therapy  inverse  planning.  The  initial  results  obtained  on  a  realistic  prostate  case  were  satisfactory 
and  show  good  indication  that  superiorization  works  and  can  be  applied  to  a  radiation  treatment  planning 
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problems.  We  further  extended  the  framework  to  include  quadratic  programming  and  provided  the  means 
to  use  superiorization  using  this  approach. 

Finally,  we  implemented,  tested  and  evaluated  our  new  framework  on  prostate  cases.  We  performed  a 
thorough  investigation  as  detailed  in  the  SOW  and  reported  in  the  literature  and  in  meetings  on  our  results. 
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Purpose:  To  describe  and  mathematically  validate  the  superiorization  methodology,  which  is  a  re¬ 
cently  developed  heuristic  approach  to  optimization,  and  to  discuss  its  applicability  to  medical  physics 
problem  formulations  that  specify  the  desired  solution  (of  physically  given  or  otherwise  obtained  con¬ 
straints)  by  an  optimization  criterion. 

Methods:  The  superiorization  methodology  is  presented  as  a  heuristic  solver  for  a  large  class  of 
constrained  optimization  problems.  The  constraints  come  from  the  desire  to  produce  a  solution  that 
is  constraints-compatible,  in  the  sense  of  meeting  requirements  provided  by  physically  or  otherwise 
obtained  constraints.  The  underlying  idea  is  that  many  iterative  algorithms  for  finding  such  a  solution 
are  perturbation  resilient  in  the  sense  that,  even  if  certain  kinds  of  changes  are  made  at  the  end  of  each 
iterative  step,  the  algorithm  still  produces  a  constraints-compatible  solution.  This  property  is  exploited 
by  using  permitted  changes  to  steer  the  algorithm  to  a  solution  that  is  not  only  constraints-compatible, 
but  is  also  desirable  according  to  a  specified  optimization  criterion.  The  approach  is  very  general,  it 
is  applicable  to  many  iterative  procedures  and  optimization  criteria  used  in  medical  physics. 

Results:  The  main  practical  contribution  is  a  procedure  for  automatically  producing  from  any  given 
iterative  algorithm  its  superiorized  version,  which  will  supply  solutions  that  are  superior  according 
to  a  given  optimization  criterion.  It  is  shown  that  if  the  original  iterative  algorithm  satisfies  certain 
mathematical  conditions,  then  the  output  of  its  superiorized  version  is  guaranteed  to  be  as  constraints- 
compatible  as  the  output  of  the  original  algorithm,  but  it  is  superior  to  the  latter  according  to  the 
optimization  criterion.  This  intuitive  description  is  made  precise  in  the  paper  and  the  stated  claims 
are  rigorously  proved.  Superiorization  is  illustrated  on  simulated  computerized  tomography  data  of 
a  head  cross  section  and,  in  spite  of  its  generality,  superiorization  is  shown  to  be  competitive  to  an 
optimization  algorithm  that  is  specifically  designed  to  minimize  total  variation. 

Conclusions:  The  range  of  applicability  of  superiorization  to  constrained  optimization  problems  is 
very  large.  Its  major  utility  is  in  the  automatic  nature  of  producing  a  superiorization  algorithm  from  an 
algorithm  aimed  at  only  constraints-compatibility;  while  nonheuristic  (exact)  approaches  need  to  be 
redesigned  for  a  new  optimization  criterion.  Thus  superiorization  provides  a  quick  route  to  algorithms 
for  the  practical  solution  of  constrained  optimization  problems.  ©  2012  American  Association  of 
Physicists  in  Medicine.  [http://dx.doi.org/10.1118/T4745566] 

Key  words:  superiorization,  constrained  optimization,  heuristic  optimization,  tomography,  total 
variation 


I.  INTRODUCTION 

Optimization  is  a  tool  that  is  used  in  many  areas  of  Medi¬ 
cal  Physics.  Prime  examples  are  radiation  therapy  treatment 
planning  and  tomographic  reconstruction,  but  there  are  others 
such  as  image  registration.  Some  well-cited  classical  publica¬ 
tions  on  the  topic  are  Refs.  1-12  and  some  recent  articles  are 
Refs.  13-26. 

In  a  typical  medical  physics  application,  one  uses  con¬ 
strained  optimization,  where  the  constraints  come  from  the 


desire  to  produce  a  solution  that  is  constraints-compatible,  in 
the  sense  of  meeting  the  requirements  provided  by  physically 
or  otherwise  obtained  constraints.  In  radiation  therapy  treat¬ 
ment  planning,  the  requirements  are  usually  in  the  form  of 
constraints  prescribed  by  the  treatment  planner  on  the  doses 
to  be  delivered  at  specific  locations  in  the  body.  These  doses 
in  turn  depend  on  information  provided  by  an  imaging  in¬ 
strument,  typically  a  magnetic  resonance  imaging  (MRI)  or 
a  computerized  tomography  (CT)  scanner.  In  tomography,  the 
constraints  come  from  the  detector  readings  of  the  instrument. 
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In  such  applications,  it  is  typically  the  case  that  a  large  num¬ 
ber  of  solutions  would  be  considered  good  enough  from  the 
point  of  view  of  being  constraints-compatible;  to  a  large  ex¬ 
tent,  but  not  entirely,  due  to  the  fact  that  there  is  uncertainty 
as  to  the  exact  nature  of  the  constraints  (for  example,  due  to 
noise  in  the  data  collection).  In  such  a  case,  an  optimization 
criterion  is  introduced  that  helps  us  to  distinguish  the  “better” 
constraints-compatible  solutions  (for  example,  this  criterion 
could  be  the  total  dose  to  be  delivered  to  the  body,  which  may 
vary  quite  a  bit  between  radiation  therapy  treatment  plans  that 
are  compatible  with  the  constraints  on  the  doses  delivered  to 
individual  locations). 

The  superiorization  methodology  (see,  for  example. 
Refs.  22  and  27-32)  is  a  recently  developed  heuristic  ap¬ 
proach  to  optimization.  The  word  heuristic  is  used  here  in 
the  sense  that  the  process  is  not  guaranteed  to  lead  to  an  op¬ 
timum  according  to  the  given  criterion;  approaches  aimed  at 
processes  that  are  guaranteed  in  that  sense  are  usually  referred 
to  as  exact.  Heuristic  approaches  have  been  found  useful  in 
practical  applications  of  optimization,  mainly  because  they 
are  often  computationally  much  less  expensive  than  their  ex¬ 
act  counterparts,  but  nevertheless  provide  solutions  that  are 
appropriate  for  the  application  at  hand. 33-35 

The  underlying  idea  of  the  superiorization  approach  is  the 
following.  In  many  applications  there  exists  a  computation¬ 
ally  efficient  iterative  algorithm  that  produces  a  constraints- 
compatible  solution  for  the  given  constraints.  (An  example 
of  this  for  radiation  therapy  treatment  planning  is  reported 
in  Ref.  36,  its  clinical  use  is  discussed  in  Ref.  15.)  Fur¬ 
thermore,  often  the  algorithm  is  perturbation  resilient  in  the 
sense  that,  even  if  certain  kinds  of  changes  are  made  at 
the  end  of  each  iterative  step,  the  algorithm  still  produces 
a  constraints-compatible  solution.37-’0  This  property  is  ex¬ 
ploited  in  the  superiorization  approach  by  using  such  pertur¬ 
bations  to  steer  the  algorithm  to  a  solution  that  is  not  only 
constraints-compatible,  but  is  also  desirable  according  to  a 
specified  optimization  criterion.  The  approach  is  very  general, 
it  is  applicable  to  many  iterative  procedures  and  optimization 
criteria. 

The  current  paper  presents  a  major  advance  in  the 
practice  and  theory  of  superiorization.  The  previous 
publications33'37-32  used  the  intuitive  idea  to  present  some  su¬ 
periorization  algorithms,  in  this  paper  the  reader  will  find  a  to¬ 
tally  automatic  procedure  that  turns  an  iterative  algorithm  into 
its  superiorized  version.  This  version  will  produce  an  output 
that  is  as  constraints-compatible  as  the  output  of  the  original 
algorithm,  but  it  is  superior  to  that  according  to  an  optimiza¬ 
tion  criterion.  This  claim  is  mathematically  shown  to  be  true 
for  a  very  large  class  of  iterative  algorithms  and  for  optimiza¬ 
tion  criteria  in  general,  typical  restrictions  (such  as  convexity) 
on  the  optimization  criterion  are  not  essential  for  the  material 
presented  below.  In  order  to  make  precise  and  validate  this 
broad  claim,  we  present  here  a  new  theoretical  framework. 
The  framework  of  Ref.  29  is  a  precursor  of  what  we  present 
here,  but  it  is  a  restricted  one,  since  it  assumes  that  the  con¬ 
straints  can  be  all  satisfied  simultaneously,  which  is  often  false 
in  medical  physics  applications.  There  is  no  such  restriction 
in  the  presentation  below. 


The  idea  of  designing  algorithms  that  use  interlacing  steps 
of  two  different  kinds  (in  our  case,  one  kind  of  steps  aim  at 
constraints-compatibility  and  the  other  kind  of  steps  aim  at 
improvement  of  the  optimization  criterion)  is  well-established 
and,  in  fact,  is  made  use  of  in  many  approaches  that  have 
been  proposed  with  exact  constrained  optimization  in  mind; 
see,  for  example,  the  works  of  Helou  Neto  and  De  Fierro, 
Nurminski,39  Combettes  and  co-workers,40,41  Sidky  and  co- 
workers,23'43,4’  and  Defrise  and  co-workers.44  However,  none 
of  these  approaches  can  do  what  can  be  done  by  the  superi¬ 
orization  approach  as  presented  below,  namely,  the  automatic 
production  of  a  heuristic  constrained  optimization  algorithm 
from  an  iterative  algorithm  for  constraints-compatibility.  For 
example,  in  Ref.  37  it  is  assumed  (just  as  in  the  theory  pre¬ 
sented  in  our  Ref.  29)  that  all  the  constraints  can  be  satisfied 
simultaneously. 

A  major  motivator  for  the  additional  theory  presented  in 
the  current  paper  is  to  get  rid  of  this  assumption,  which 
is  not  reasonable  when  handling  real  problems  of  medical 
physics.  Motivated  by  similar  considerations,  Helou  Neto  and 
De  Pierro38  present  an  alternative  approach  that  does  not 
require  this  unreasonable  assumption.  However,  in  order  to 
solve  such  a  problem,  they  end  up  with  iterative  algorithms 
of  a  particular  form  rather  than  having  the  generality  of  be¬ 
ing  able  to  turn  any  constraints-compatibility  seeking  algo¬ 
rithm  into  a  superiorized  one  capable  of  handling  constrained 
optimization.  Also,  the  assumptions  they  have  to  make  in 
order  to  prove  their  convergence  result  (their  Theorem  15) 
indicate  that  their  approach  is  applicable  to  a  smaller  class 
of  constrained  optimization  problems  than  the  superioriza¬ 
tion  approach  whose  applicability  seems  to  be  more  general. 
However,  for  the  mathematical  purist,  we  point  out  that  they 
present  an  exact  constrained  optimization  algorithm,  while 
superiorization  is  a  heuristic  approach.  Whether  this  is  rel¬ 
evant  to  medical  physics  practice  is  not  clear:  exact  algo¬ 
rithms  are  not  run  forever,  but  are  stopped  according  to  some 
stopping-rule,  the  relevant  questions  in  comparing  two  algo¬ 
rithms  are  the  quality  of  the  actual  output  and  the  computation 
time  needed  to  obtain  it. 

Ultimately,  the  quality  of  the  outputs  should  be  evaluated 
by  some  figures  of  merit  relevant  to  the  medical  task  at  hand. 
An  example  of  a  careful  study  of  this  kind  that  involves  su¬ 
periorization  is  in  Sec.  4.3  of  Ref.  30,  which  reports  on  com¬ 
paring  in  CT  the  efficacy  of  constrained  optimization  recon¬ 
struction  algorithms  for  the  detection  of  low-contrast  brain 
tumors  by  using  the  method  of  statistical  hypothesis  testing 
(which  provides  a  P- value  that  indicates  the  significance  by 
which  we  can  reject  the  null  hypothesis  that  the  two  algo¬ 
rithms  are  equally  efficacious  in  favor  of  the  alternative  that 
one  is  preferable).  Such  studies  bundle  together  two  things: 

(i)  the  formulation  of  the  constrained  optimization  task  and 

(ii)  the  performance  of  the  algorithm  in  performing  that  task. 
The  first  of  these  requires  a  translation  of  the  medical  aim  into 
a  mathematical  model,  it  is  important  that  this  model  should 
be  appropriately  chosen. 

The  superiorization  approach  is  not  about  choosing  this 
model,  it  kicks  in  once  the  model  is  chosen  and  aims 
at  producing  an  output  that  is  “good”  according  to  the 
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mathematical  specifications  of  the  constraints  and  of  the 
optimization  criterion.  Thus  superiorization  has  been  used 
to  compare  the  effects  on  the  quality  of  the  output  in  CT 
when  the  optimization  criterion  is  specified  by  total  vari¬ 
ation  (TV)  versus  by  entropy2*  or  versus  by  the  t  \  -norm 
of  the  Haar  transform.32  However,  the  current  paper  is  not 
about  discussing  how  to  translate  the  underlying  medical 
physics  task  into  a  constrained  optimization  problem.  For 
our  purposes  here,  we  are  assuming  that  the  mathematical 
model  has  been  worked  out  and  concentrate  on  the  algo¬ 
rithmic  approach  for  solving  the  resulting  constrained  op¬ 
timization  problem.  We  claim  that  the  evaluation  of  such 
algorithms  should  not  be  based  on  the  medical  figures  of 
merit  mentioned  at  the  beginning  of  the  previous  paragraph, 
but  rather  on  their  performance  in  solving  the  mathemat¬ 
ical  problem.  If  “good”  solutions  to  the  constrained  opti¬ 
mization  problem  are  not  medically  efficacious,  that  indi¬ 
cates  that  something  is  wrong  with  the  mathematical  model 
and  not  that  something  is  wrong  with  the  algorithmic  ap¬ 
proach.  For  this  reason,  in  this  paper  we  will  not  carry  out 
a  careful  investigation  of  the  medical  efficacy  of  any  algo¬ 
rithm  in  the  manner  that  we  have  done  in  Sec.  4.3  of  Ref.  30, 
but  will  restrict  ourselves  to  a  simple  illustration  of  the  per¬ 
formance  of  the  superiorization  approach  as  compared  to  the 
previously  published  algorithm  of  Ref.  42  that  is  aimed  at  per¬ 
forming  exact  minimization. 

Examples  of  such  studies  already  exist.  Superiorization 
was  compared  in  Ref.  27  with  Algorithm  6  of  Ref.  40  and  in 
Ref.  45  with  the  algorithm  of  Goldstein  and  Osher  that  they 
refer  to  as  TwIST  (Ref.  46)  with  split  Bregman47  as  the  sub¬ 
step.  In  both  cases  the  implementation  was  done  by  the  pro¬ 
posers  of  the  algorithms.  In  these  reported  instances  superi¬ 
orization  did  well:  the  constraints-compatibility  and  the  value 
of  the  function  to  be  minimized  were  very  similar  for  the  out¬ 
puts  produced  by  the  algorithms  being  compared,  but  the  su¬ 
periorization  algorithm  produced  its  output  four  times  faster 
than  the  alternative.  It  would  be  unjustified  to  draw  any  gen¬ 
eral  conclusions  on  the  mathematical  performance  and  speed 
of  superiorization  based  on  just  a  few  experiments,  but  the 
reported  results  are  encouraging. 

However,  the  main  reason  why  we  advocate  superioriza¬ 
tion  is  different  from  what  is  discussed  above.  The  reason 
why  we  claim  it  to  be  helpful  in  medical  physics  research 
is  that  it  has  the  potential  of  saving  a  lot  of  time  and  ef¬ 
fort  for  the  researcher.  Let  us  consider  a  historical  example. 
Likelihood  optimization  using  the  iterative  process  of  expec¬ 
tation  maximization  (EM)  (Ref.  48)  gained  immediate  and 
wide  acceptance  in  the  emission  tomography  community.  It 
was  observed  that  irregular  high  amplitude  patterns  occurred 
in  the  image  with  a  large  number  of  iterations,  but  it  was 
not  until  five  years  later  that  this  problem  was  corrected47 
by  the  use  of  a  maximum  a  posteriority  probability  (MAP) 
algorithm  with  a  multivariate  Gaussian  prior.  Had  we  had 
at  our  disposal  the  superiorization  approach,  then  the  intro¬ 
duction  of  an  optimization  criterion  (Gaussian  or  other)  into 
the  iterative  EM  process  would  have  been  a  simple  matter 
and  we  would  have  saved  the  time  and  effort  spent  on  de¬ 
signing  a  special  purpose  algorithm  for  the  MAP  formula¬ 


tion.  A  TV -superiorization  of  the  EM  algorithm  is  presented 
in  Ref.  50. 

Even  though  our  major  claim  for  superiorization  is  that  it 
provides  a  quick  route  to  algorithms  for  the  practical  solution 
of  constrained  optimization  problems,  before  leaving  this  in¬ 
troduction  let  us  bring  up  a  question  that  has  to  do  with  the 
performance  of  the  resulting  algorithms:  Will  superiorization 
produce  superior  results  to  those  produced  by  contemporary 
MAP  methods  or  is  it  faster  than  the  better  of  such  methods? 
At  this  stage  we  have  not  yet  developed  the  mathematical  no¬ 
tation  to  discuss  this  question  in  a  rigorous  manner,  we  return 
to  it  in  Subsection  II.F. 

In  Sec.  Ill,  we  present  in  detail  the  superiorization  method¬ 
ology.  In  Sec.  Ill,  we  provide  an  illustrative  example  by  re¬ 
porting  on  reconstructions  produced  by  algorithms  applied  to 
simulated  computerized  tomography  data  of  a  head  cross  sec¬ 
tion.  In  the  final  section,  we  discuss  our  results  and  present 
our  conclusions. 

II.  THE  SUPERIORIZATION  METHODOLOGY 

II.A.  Problem  sets,  proximity  functions,  and 
e-compatibility 

Although  optimization  is  often  studied  in  a  more  general 
context  (such  as  in  Hilbert  or  Banach  spaces),  in  medical 
physics  we  usually  deal  with  a  special  case,  where  optimiza¬ 
tion  is  performed  in  a  Euclidean  space  R7  (the  space  of  J- 
dimensional  vectors  of  real  numbers,  where  /  is  a  positive  in¬ 
teger).  As  often  appropriate  in  practice,  we  further  restrict  the 
domain  of  optimization  to  a  nonempty  subset  Q  of  R7  (such 
as  the  non-negative  orthant  R7  that  consists  of  vectors  all  of 
whose  components  are  non-negative). 

We  now  turn  to  formalizing  the  notion  of  being  compatible 
with  given  constraints,  a  notion  that  we  have  used  informally 
in  Sec.  I.  In  any  application,  there  is  a  problem  set  T ;  each 
problem  T  e  T  is  essentially  a  description  of  the  constraints 
in  that  particular  case.  For  example,  for  a  tomographic 
scanner,  the  problem  of  reconstruction  for  a  particular  patient 
at  a  particular  time  is  determined  by  the  measurements  taken 
by  the  scanner  for  that  patient  at  that  time.  The  intuitive 
notion  of  constraints-compatibility  is  formalized  by  the  use 
of  a  proximity  function  Vr  on  T  such  that,  for  every  T  e  T, 
VrT  maps  Q  into  R+,  the  set  of  non-negative  real  numbers; 
i.e.,  Vrj  :  — »■  R+.  Intuitively,  we  think  of  Vrj(x)  as  an 
indicator  of  how  incompatible  x  is  with  the  constraints  of  T. 
For  example,  in  tomography,  VrT(x)  should  indicate  by  how 
much  a  proposed  reconstruction  that  is  described  by  an  x  in 

violates  the  constraints  of  the  problem  T  that  are  provided 
by  the  measurements  taken  by  the  scanner.  For  example,  if 
we  use  b  to  denote  the  vector  of  estimated  line  integrals  based 
on  the  measurements  obtained  by  the  scanner  and  by  A  the 
system  matrix  of  the  scanner,  then  a  possible  choice  for  the 
proximity  function  is  the  norm-distance  \\b  —  Ax\\,  which 
we  will  use  as  an  example  in  the  discussions  that  follow.  An 
alternative  legitimate  choice  for  the  proximity  function  is  the 
Kullback-Leibler  distance  KL(b ,  Ax),  which  is  the  negative 
log-likelihood  of  a  statistical  model  in  tomography.  The 
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special  case  Vrj(x)  —  0  is  interpreted  by  saying  that  x  is 
perfectly  compatible  with  the  constraints;  due  to  the  presence 
of  noise  in  practical  applications,  it  is  quite  conceivable  that 
there  is  no  x  that  is  perfectly  compatible  with  the  constraints, 
and  we  accept  an  x  as  constraints-compatible  as  long  as  the 
value  of  Vrj{x)  is  considered  to  be  small  enough  to  justify 
that  decision.  Combining  these  two  concepts  leads  to  the 
notion  of  a  problem  structure,  which  is  a  pair  (T ,  Vr),  where 
T  is  a  nonempty  problem  set  and  Vr  is  a  proximity  function 
on  T.  For  a  problem  structure  (T,  Vr),  a  problem  T  e  T,  a 
non-negative  e,  and  an  x  e  £2,  we  say  that  x  is  e-compatible 
with  T  provided  that  Vrr(x)  <  e. 

As  an  example  (whose  applicability  to  tomographic  re¬ 
construction  is  illustrated  in  Sec.  Ill),  consider  the  problem 
structure  that  arises  from  the  desire  to  find  non-negative  so¬ 
lutions  of  sequences  of  blocks  of  linear  equations.  Then  the 
appropriate  choices  are  £2  =  R7  and  the  problem  structure  is 
(S,  Res),  where  the  problem  set  §  is 

S  =  {({(«'.  b\), . ..,  (a*1,  bet)}, ... , 

{(all+-+lw-l+\bll+...+lw_l+l), ...,  ,btl+ ...+^)})l 

IT  is  a  positive  integer  and, 

for  1  <  w  <  W,  lw  is  a  positive  integer  and, 

for  1  <  i  <  1 1  +  . . .  +  lw,  e  R7  and  bt  e  R}  (1) 

and  the  proximity  function  Res  on  S  is  defined,  for  any 
problem  S  =  ({(a1,  b\),. . . ,  (aei ,  bifi], . . . ,  {(a^+-+^-1+1, 
bel+...+ew_l+i), . . . ,  (all+  -+lw ,btl+...+tw)})  in  S  and  for  any 
x  e  £2,  by 

Z\+...+ty/ 

Ress(x)  =  (bi  -  {a‘,x))2.  (2) 

N  '=i 

Note  that  each  element  of  this  problem  set  S  specifies  an 
ordered  sequence  of  W  blocks  of  linear  equations  of  the  form 
(a1 ,  x)  =  bj  where  (*,*)  denotes  the  inner  product  in  R7  (and 
thus  S  is  an  appropriate  representation  of  the  so-called  “or¬ 
dered  subsets”  approach  to  tomographic  reconstruction,51  as 
well  as  of  other  earlier-published  block-iterative  methods  that 
proposed  essentially  the  same  idea''2-54).  The  proximity  func¬ 
tion  Res  on  S  is  the  residual  that  we  get  when  a  particular*  is 
substituted  into  all  the  equations  of  a  particular  problem  S. 


Selecting  £2  =  R7  and  A  =  R7  for  the  problem  structure 
(S,  Res)  of  Subsection  II. A,  an  example  of  an  algorithm  R  is 
specified  by 

Rv*  =  QBVh  Bsx,  (3) 


where  S  is  the  problem  specified  above  in  Eq.  (2)  and,  for 
l  <  w  <  W,  :  A  A  is  defined  by 


Bv  *  =  * 


T  £ 


bi  -  (a‘  ,x) 


FII2 


(4) 


i=^l+---+^lO-l  +  l 


where  ||a||  denotes  the  norm  of  the  vector  a  in  R7,  and  Q  : 
A  — »■  £2  is  defined  by 

(Q x)j  =  max{0,  Xj},  for  1  <  j  <  J.  (5) 

Note  that  R5  :  A  —*■  £2.  This  specific  algorithm  R  is  a  typ¬ 
ical  example  of  the  so-called  block-iterative  methods  men¬ 
tioned  above.  Except  for  the  presence  of  Q  in  Eq.  (3),  which 
enforces  non-negativity  of  the  components,  it  is  identical  to 
an  algorithm  used  and  illustrated  in  Ref.  31.  With  the  Q  ab¬ 
sent  from  the  definition  of  the  algorithm,  £2  has  to  be  the 
whole  of  R7;  the  practical  consequence  of  the  presence  ver¬ 
sus  the  absence  of  Q  in  the  tomographic  application  is  illus¬ 
trated  in  Subsection  III.D.  We  also  note  that  special  cases  of 
the  presented  algorithm  include  the  classical  reconstruction 
methods  such  as  algebraic  reconstruction  technique  (ART)  (if 
lw  =  1,  for  1  <  w  <  W)  and  SIRT  (if  W  =  1);  see,  for  ex¬ 
ample,  Chaps.  11  and  12  of  Ref.  55. 

For  a  problem  structure  (T,  Vr),  a  T  e  T,  an  e  e  R+, 
and  a  sequence  R  —  (xk)j^0  of  points  in  £2,  we  use  0(T, 
e,  R)  to  denote  the  *  e  £2  that  has  the  following  properties: 
Vrj{x)  <  e  and  there  is  a  non-negative  integer  K  such  that 
xK  —  x  and,  for  all  non-negative  integers  k  <  KVrT(xk)  >  e. 
Clearly,  if  there  is  such  an  *,  then  it  is  unique.  If  there  is  no 
such  x,  then  we  say  that  0(T,  e,  R)  is  undefined,  otherwise 
we  say  that  it  is  defined.  The  intuition  behind  this  definition 
is  the  following:  if  we  think  of  R  as  the  (infinite)  sequence 
of  points  that  is  produced  by  an  algorithm  (intended  for  the 
problem  T)  without  a  termination  criterion,  then  0(T,  e,  R)  is 
the  output  produced  by  that  algorithm  when  we  add  to  it  in¬ 
structions  that  make  it  terminate  as  soon  as  it  reaches  a  point 
that  is  e-compatible  with  T. 


II. B.  Algorithms  and  outputs 

We  now  define  the  concept  of  an  algorithm  in  the  general 
context  of  problem  structures.  For  technical  reasons  that  will 
become  clear  as  we  proceed  with  our  development,  we  intro¬ 
duce  an  additional  set  A,  such  that  (2  C  A  C  R7.  (Both  £2 
and  A  are  assumed  to  be  known  and  fixed  for  any  particu¬ 
lar  problem  structure  (T ,  Vr).)  An  algorithm  P  for  a  problem 
structure  (T,Vr)  assigns  to  each  problem  T  e  T  an  oper¬ 
ator  P7-  :  A  —*■  £2.  This  definition  is  used  to  define  iterative 
processes  that,  for  any  initial  point  x  e  £2,  produce  the  (po¬ 
tentially)  infinite  sequence  ((Pr)4*)^  (that  is,  the  sequence 
x,  P7-X,  Py'(P  jx),  ■  ■  ■)  of  points  in  S2.  We  discuss  below  how 
such  a  potentially  infinite  process  is  terminated  in  practice. 


II. C.  Bounded  perturbation  resilience 

The  notion  of  a  bounded  perturbations  resilient  algorithm 
P  for  a  problem  structure  (T ,  Vr)  has  been  defined  in  a  math¬ 
ematically  precise  manner.2 1  However,  that  definition  is  not 
satisfactory  from  the  point  of  view  of  applications  in  medical 
physics  (or  indeed  in  any  area  involving  noisy  data),  because 
it  is  useful  only  for  problems  T  for  which  there  is  a  perfectly 
compatible  solution  (that  is,  an*  such  that  Vrj{x)  —  0).  We 
therefore  extend  here  that  notion  as  follows.  An  algorithm  P 
for  a  problem  structure  (T,  Vr)  is  said  to  be  strongly  pertur¬ 
bation  resilient  if,  for  all  T  e  T, 

(i)  there  exists  an  e  e  R+  such  that  0(T,  e,  ((P r)<r*)^=0) 
is  defined  for  every  *  e  £2; 
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(ii)  for  all  s  £  R+  such  that  0(T ,  s,  ((P T)kx)^f0)  is  de¬ 
fined  for  every  x  £  £2,  we  also  have  that  (XT,  s',  R) 
is  defined  for  every  s'  >  s  and  for  every  sequence 
R  —  (x*)£2.0  °f  points  in  £2  generated  by 

xk+1  =  PT(xk  +  fkvk),  for  all  k>  0,  (6) 

where  fkvk  are  bounded  perturbations,  meaning  that 
the  sequence  (AOtlo  of  non-negative  real  numbers 

Eoo 

Pk  <  oo),  the  sequence 

(vk)kLo  °f  vectors  in  Ry  is  bounded  and,  for  all 
k  >  0,  xk  +  fkvk  £  A. 

In  less  formal  terms,  the  second  of  these  properties  says 
that  for  a  strongly  perturbation  resilient  algorithm  we  have 
that,  for  every  problem  and  any  non-negative  real  number  e, 
if  it  is  the  case  that  for  all  initial  points  from  Q  the  infinite  se¬ 
quence  produced  by  the  algorithm  contains  an  e-compatible 
point,  then  it  will  also  be  the  case  that  all  perturbed  sequences 
satisfying  Eq.  (6)  contain  an  e'-compatible  point,  for  any 
s'  >  s. 

Having  defined  the  notion  of  a  strongly  perturbation  re¬ 
silient  algorithm,  we  next  show  that  this  notion  is  of  relevance 
to  problems  in  medical  physics.  We  illustrate  the  use  of  this 
in  tomography  in  Sec.  III.  We  first  need  to  introduce  some 
mathematical  concepts. 

Given  an  algorithm  P  for  a  problem  structure  (T ,  Vr)  and 
a  T  £  T,  we  say  that  P  is  convergent  for  T  if,  for  every  x  £  Q, 
there  exists  a  unique  y(x)  £  £2  such  that,  lim^oclPy/x 
=  y(x),  meaning  that  for  every  positive  real  number  8,  there 
exists  a  non-negative  integer  K,  such  that  ||(Pr)*x  —  y(x)|| 
<  8,  for  all  non-negative  integers  k  >  K.  If,  in  addition,  there 
exists  aye  R+  such  that  Vrjiyix))  <  y,  for  every  x  e  £2, 
then  we  say  that  P  is  boundedly  convergent  for  T. 

A  function  /  :  £2  — >■  R  is  uniformly  continuous  if,  for  ev¬ 
ery  s  >  0  there  exists  a  8  >  0,  such  that,  for  all  x ,  y  £  £2, 

| /0c)  —  f(y) |  <  s  provided  that  ||jc  —  y||  <  <5.  An  example 
of  a  uniformly  continuous  function  is  Ress  of  Eq.  (2),  for 
any  S  £  S.  This  can  be  proved  by  observing  that  the  right- 
hand  side  of  Eq.  (2)  can  be  rewritten  in  vector/matrix  form 
as  \\b  —  Ax  ||  and  then  selecting,  for  any  given  s  >  0,  8  to  be 
s/ 1|  A  ||,  where  ||  A||  denotes  the  matrix  norm  of  A. 

An  operator  O:  A  ->  £2,  is  nonexpansive  if  ||Ox  —  Oy  | 
<  ||x  —  A II -  f°r  all  x.  y  e  A.  An  example  of  a  nonexpansive 
operator  is  the  Ry  of  Eq.  (3).  The  proof  of  this  is  also  sim¬ 
ple.  It  follows  from  discussions  regarding  similar  claims  in 
Ref.  27  that  the  BSui  :  Ry  -»  Ry  of  Eq.  (4)  is  a  nonexpan¬ 
sive  operator,  for  1  <  w  <  W,  and  that  the  operator  Q  of 
Eq.  (5)  is  also  nonexpansive.  Obviously,  a  sequential  appli¬ 
cation  of  nonexpansive  operators  results  in  a  nonexpansive 
operator  and  thus  R  y  is  nonexpansive. 

Now  we  state  an  important  new  result  that  gives  suffi¬ 
cient  conditions  for  strong  perturbation  resilience:  If  P  is 
an  algorithm  for  a  problem  structure  (T  ,Vr)  such  that,  for 
all  T  £  T,  P  is  boundedly  convergent  for  T,  VrT  :  £2  — »■  R 
is  uniformly  continuous,  and  Pr  :  A  —*■  £2  is  nonexpansive, 
then  P  is  strongly  perturbation  resilient.  The  importance  of 
this  result  lies  in  the  fact  that  the  rather  ordinary  condition  of 
uniform  continuity  for  the  proximity  function  and  the  reason¬ 


able  conditions  of  bounded  convergence  and  nonexpansive- 
ness  of  the  algorithmic  operators  guarantee  that  we  end  up 
with  a  strongly  perturbation  resilient  algorithm.  The  proof  of 
this  new  result  involves  some  mathematical  technicalities  and 
is  therefore  presented  in  the  Appendix  as  Theorem  1 . 

II. D.  Optimization  criterion  and  nonascending  vector 

Now  suppose,  as  is  indeed  the  case  for  the  constrained 
optimization  problems  discussed  in  Sec.  I,  that  in  addition 
to  a  problem  structure  (T  ,Vr)  we  are  also  provided  with 
an  optimization  criterion,  which  is  specified  by  a  function 
</>  :  A  —*■  R,  with  the  convention  that  a  point  in  A  for  which 
the  value  of  <p  is  smaller  is  considered  superior  (from  the  point 
of  view  of  our  application)  to  a  point  in  A  for  which  the  value 
of  tp  is  larger.  In  the  tomography  context,  any  of  the  functions 
ofx  that  are  listed  as  a  “secondary  optimization  criterion”  (an 
alternative  name  is  a  “regularizer”)  in  Sec.  6.4  of  Ref.  55  is  an 
acceptable  choice  for  the  optimization  criterion  (j>.  These  in¬ 
clude  weighted  norms,  the  negative  of  Shannon’s  entropy  and 
total  variation.  It  is  the  last  of  these  that  we  discuss  in  detail  in 
the  illustrative  example  below.  The  essential  idea  of  the  supe¬ 
riorization  methodology  presented  in  this  paper  is  to  make  use 
of  the  perturbations  of  Eq.  (6)  to  transform  a  strongly  pertur¬ 
bation  resilient  algorithm  that  seeks  a  constraints-compatible 
solution  into  one  whose  outputs  are  equally  good  from  the 
point  of  view  of  constraints-compatibility,  but  are  superior 
according  to  the  optimization  criterion.  We  do  this  by  pro¬ 
ducing  from  the  algorithm  another  one,  called  its  superi- 
orized  version,  by  making  sure  not  only  that  the  fkvk  are 
bounded  perturbations,  but  also  that  cp(x  k  +  PkVk)  <  <P(xk), 
for  all  k  >  0. 

In  order  to  ensure  this  we  introduce  a  new  concept  (closely 
related  to  the  concept  of  a  “descent  direction”  that  is  widely 
used  in  optimization).  Given  a  function  /  :  A  -»  R  and  a 
point  x  £  A,  we  say  that  a  vector  d  e  Ry  is  nonascending 
for  <p  at  x  if  || d ||  <  1  and 

there  is  a  8  >  0  such  that  for  all  X  £  [0,  <5], 

(7) 

(x  +  Xd)  £  A  and  /(x  +  Xd)  <  <p(x). 

Note  that  irrespective  of  the  choices  of  </>  and  x,  there  is  al¬ 
ways  at  least  one  nonascending  vector  d  for  </>  atx,  namely,  the 
zero-vector,  all  of  whose  components  are  zero.  This  is  a  useful 
fact  for  proving  results  concerning  the  guaranteed  behavior  of 
our  proposed  procedures.  However,  in  order  to  steer  our  algo¬ 
rithms  towards  a  point  at  which  the  value  of  cp  is  small,  we 
need  to  find  a  d  such  that  fix  +  Xd)  <  fix)  rather  than  just 
fix  +  Xd)  <  fix)  as  in  Eq.  (7).  In  some  earlier  papers  on 
superiorization27-’1  it  was  assumed  that  A  =  Ry  and  that  / 
is  a  convex  function.  This  implied  that,  for  any  point  x  £  A, 
4>  had  a  subgradient  g  e  Ry  at  the  point  x.  It  was  suggested 
that  if  there  is  such  a  g  with  a  positive  norm,  then  d  should 
be  chosen  to  be  —g/\\g\\,  otherwise  d  should  be  chosen  to  be 
the  zero  vector.  However,  there  are  approaches  (not  involving 
subgradients)  to  selecting  an  appropriate  d,  an  example  can  be 
found  in  Ref.  32  in  which  d  is  found  without  using  subgradi¬ 
ents  for  the  case  when  <p  is  the  £  \  -norm  of  the  Haar  transform. 
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The  method  we  used  for  selecting  a  nonascending  vector  in 
the  experiments  reported  in  this  paper  is  specified  at  the  end 
of  Subsection  III. A. 

II. E.  Superiorized  version  of  an  algorithm 

We  now  make  precise  the  ingredients  needed  for  trans¬ 
forming  an  algorithm  into  its  superiorized  version.  Let  Q  and 
A  be  the  underlying  sets  for  a  problem  structure  (T ,Vr) 
(Sic  AC  Ry,  as  discussed  at  the  beginning  of  Subsec¬ 
tion  II. B),  P  be  an  algorithm  for  (T ,  Vr )  and  f  :  A  —*■  R. 
The  following  description  of  the  Superiorized  Version  of 
Algorithm  P  produces,  for  any  problem  T  e  T,  a  sequence 
Rt  —  (xk)f=()  of  points  in  £2  for  which,  for  all  k  >  0,  Eq.  (6) 
is  satisfied.  We  show  this  to  be  true,  for  any  algorithm  P,  after 
the  description  of  the  Superiorized  Version  of  Algorithm  P. 
Furthermore,  since  the  sequence  R/  is  steered  by  Superiorized 
Version  of  Algorithm  P  towards  a  reduced  value  of  </>,  there 
is  an  intuitive  expectation  that  the  output  of  the  superiorized 
version  is  likely  to  be  superior  (from  the  point  of  view  of 
the  optimization  criterion  </;)  to  the  output  of  the  original 
unperturbed  algorithm.  This  last  statement  is  not  precise  and 
so  it  cannot  be  proved  in  a  mathematical  sense  for  an  arbitrary 
algorithm  P;  however,  that  should  not  stop  us  from  applying 
the  easy  procedure  given  below  for  automatically  producing 
the  superiorized  version  of  P  and  experimentally  checking 
whether  it  indeed  provides  us  with  outputs  superior  to  those 
of  the  original  algorithm.  The  well-demonstrated  nature  of 
heuristic  optimization  approaches  is  that  they  often  work  in 
practice  even  when  their  performance  cannot  be  guaranteed 
to  be  optimal.33  35 

Nevertheless,  we  can  push  our  theory  further  than  the  hope 
expressed  in  the  last  paragraph,  by  considering  superiorized 
versions  of  algorithms  that  satisfy  some  condition.  In  this  pa¬ 
per,  the  condition  that  we  discuss  is  strong  perturbation  re¬ 
silience.  We  show  below  that  if  P  is  strongly  perturbation 
resilient,  then,  for  any  problem  T  e  T ,  a  sequence  R/  pro¬ 
duced  by  its  superiorized  version  has  the  following  desirable 
property:  For  all  s  e  R+,  if  0(T,  s,  ((PT)kx)^f0)  is  defined 
for  every  reQ,  then  0(T,  s',  Rt)  is  also  defined  for  every 
s'  >  s',  in  other  words,  the  Superiorized  Version  of  Algorithm 
P  provides  an  e'-compatible  output.  As  stated  above,  the  ad¬ 
vantage  of  the  superiorized  version  is  that  its  output  is  likely 
to  be  superior  to  the  output  of  the  original  unperturbed  al¬ 
gorithm.  We  point  out  that  strong  perturbation  resilience  is  a 
sufficient,  but  not  necessary,  condition  for  guaranteeing  such 
desirable  behavior  of  the  superiorized  version,  finding  addi¬ 
tional  sufficient  conditions  and  proving  that  algorithms  that 
we  wish  to  superiorize  satisfy  such  conditions  is  part  of  our 
ongoing  research. 

The  superiorized  version  assumes  that  we  have  available 
a  summable  sequence  (ye)%f0  of  positive  real  numbers  (for 
example,  y  t.  —  o1,  where  0  <  a  <  1)  and  it  generates,  simul¬ 
taneously  with  the  sequence  (xk)f=(),  sequences  (vk)^LQ,  and 
(Pk)kLo-  The  latter  is  generated  as  a  subsequence  of  (yr)^L0, 
resulting  in  a  summable  sequence  (&)£! 0.  The  algorithm  fur¬ 
ther  depends  on  a  specified  initial  point  x  e  £2  and  on  a  posi¬ 
tive  integer  N.  It  makes  use  of  a  logical  variable  called  loop. 


Superiorized  Version  of  Algorithm  P 

(i)  set  k  —  0 

(ii)  set  xk  —  x 

(iii)  set  i  —  —  1 

(iv)  repeat 

(v)  set  n  —  0 

(vi)  set  xk-n  —  xk 

(vii)  while  n  <  N 

(viii)  set  vk,n  to  be  a  nonascending  vector  for  (f)  at 

Xk’n 

(ix)  set  loop  —  true 

(x)  while  loop 

(xi)  set  £  =  l  +  1 

(xii)  set  fiK  „  —  yt 

(xiii)  set  z  =  xk’n  +  Pk,nvk’n 

(xiv)  if  z  e  A  and  <j>(z)  <  <p(xk),  then 

(xv)  set  n  —  n  +  1 

(xvi)  set  xk’n  =  z 

(xvii)  set  loop  =  false 

(xviii)  set  xk+1  =  PTxk'N 

(xix)  set  k  =  k+  1 . 

Next  we  analyze  the  behavior  of  the  Superiorized  Version  of 
Algorithm  P. 

The  iteration  number  k  is  set  to  0  in  (i)  and  xk  =  x°  is  set 
to  its  initial  value  x  in  (ii).  The  integer  index  l  for  picking  the 
next  element  from  the  sequence  (yi)ff0  is  initialized  to  —1 
by  line  (iii),  it  is  repeatedly  increased  by  line  (xi).  The  lines 
(v)-(xix)  that  follow  the  repeat  in  (iv)  perform  a  complete 
iterative  step  from  x  k  to  xk+1,  infinite  repetitions  of  such  steps 
provide  the  sequence  Rt  —  (xk)f={].  During  one  iterative  step, 
there  is  one  application  of  the  operator  Pj-,  in  line  (xviii),  but 
there  are  N  steering  steps  aimed  at  reducing  the  value  of  <p\ 
the  latter  are  done  by  lines  (v)-(xvii).  These  lines  produce  a 
sequence  of  points  xk'n,  where  0  <  n  <  N  with  xk'°  =  xk, 
xk,n  e  A,  and  <p(xk'n)  <  <p(xk). 

We  prove  the  truth  of  the  last  sentence  by  induction  on 
the  non-negative  integers.  For  n  —  0,  we  have  by  lines  (v) 
and  (vi)  that  xk’°  =  xk .  But  xk  e  L!  ,  since  it  is  either  x  that 
is  assumed  to  be  in  £2  due  to  lines  (i)  and  (ii)  or  it  is  in  the 
range  £2  of  PT  due  to  lines  (xviii)  and  (xix).  Now  we  assume, 
for  any  0  <  n  <  N,  that  xkjl  e  A  and  <j>(xk,n)  <  (j)(xk)  and 
show  that  lines  (viii)-(xvii)  perform  a  computation  that  leads 
from  xk'n  to  an  xk,n+l  e  A  that  satisfies  <p(xk’n+1 )  <  <p(xk). 
To  see  this,  observe  that  line  (viii)  sets  vk,n  to  be  a  nonascend¬ 
ing  vector  for  <j>  at  xk,n,  which  implies  that  Eq.  (7)  is  satis¬ 
fied  with  x  —  xk’n  and  d  =  vk,n.  Line  (ix)  sets  loop  to  true, 
and  it  remains  true  while  searching  for  the  desired  xkj:+] , 
by  repeatedly  executing  the  loop  sequence  that  follows  line 
(x).  In  this  sequence,  line  (xi)  increases  £  by  1  and  line  (xii) 
sets  to  yt.  Thus  for  the  vector  z  defined  by  line  (xiii), 
z  e  A  and  <p{z)  <  tp(xk'"),  provided  that  fik.n  is  not  greater 
than  the  S  in  Eq.  (7).  Since  (yt)f={]  is  a  summable  sequence 
of  positive  real  numbers,  there  must  be  a  positive  integer  L 
such  that  y t  <  S,  for  all  £  >  L.  This  implies  that  if  we  ap¬ 
plied  lines  (xi)-(xiii)  often  enough,  we  would  reach  a  vector 
z  that  satisfies  z  e  A  and  <p(z)  <  <f)(xkj‘).  If  the  condition  in 
line  (xiv)  is  not  satisfied  when  the  process  gets  to  it,  then  lines 
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(xi)-(xiii)  are  again  executed  and  eventually  we  get  a  vector 
z  for  which  the  condition  in  line  (xiv)  is  satisfied  due  to  the 
induction  hypothesis  that  <p(xk’" )  <  <p(xk).  By  lines  (xv)  and 
(xvi)  we  see  that  at  that  time  xk'n+l  is  set  to  z  and  so  we  ob¬ 
tain  that  xk,n+l  e  A  and  <p(xk'n+l)  <  cp(xk),  as  desired.  Line 
(xvii)  sets  loop  to  false  and  so  control  is  returned  to  line  (vii). 
When  this  happens  for  the  Mh  time,  it  will  be  the  case  that  n 
=  N  and,  therefore,  line  (xviii)  is  used  to  produce  xk+ 1  e  £2 
and  the  increasing  of  k  by  line  (xix)  allows  us  then  to  move 
on  to  the  next  iterative  step.  Infinite  repetition  of  such  steps 
produces  the  sequence  Rj  —  (xk)^L0  of  points  in  £2. 

We  now  show  that  if  0(T,  s,  ((Pt^x^q)  is  defined  for 
every  x  e  £2,  then,  for  any  s'  >  e,  the  Superiorized  Version 
of  Algorithm  P  produces  an  e'-compatible  output.  Since  P 
is  assumed  to  be  strongly  perturbation  resilient,  this  desired 
result  follows  if  we  can  show  that  there  exists  a  summable 
sequence  )*2=o  °f  non-negative  real  numbers  and  a  bounded 
sequence  (vk)'£L0  of  vectors  in  Ry  such  that  Eq.  (6)  is  satisfied 
for  all  k  >  0.  In  view  of  line  (xviii),  this  is  achieved  if  we  can 
define  the  Pk  and  the  vk  so  that  xk,N  —  xk  +  PkVk.  This  is 
done  by  setting 


Pk  =  max{^,„  |0  <  n  <  N], 


(8) 


v 


k 


N- 1 


E 

n=0 


Pk,n  k.n 

Pk 


(9) 


That  these  assignments  result  in  xk,N  =  xk  +  PkVk  follows 
from  lines  (v)-(xvii).  From  line  (xii)  follows  that  (Pk)fL{)  is 
a  subsequence  of  (YtfiL 0  and,  hence,  it  is  a  summable  se¬ 
quence  of  non-negative  real  numbers.  Since  each  ||i^’"||  <  1 
by  the  definition  of  a  nonascending  vector,  it  follows  from 
Eqs.  (8)  and  (9)  that  ||«*||  <  N  and  so  (vk)%L0  is  bounded. 
Part  of  the  condition  expressed  in  Eq.  (6)  is  that,  for  all 
k  >  0,  xk  +  PkVk  e  A.  This  follows  from  the  fact  that 
xk'N  —  xk  +  PkVk  is  assigned  its  value  by  line  (xvi),  but  only 
if  the  condition  expressed  in  line  (xiv)  is  satisfied. 

In  conclusion,  we  have  shown  that  the  superiorized  ver¬ 
sion  of  a  strongly  perturbation  resilient  algorithm  produces 
outputs  that  are  essentially  as  constraints-compatible  as  those 
produced  by  the  original  version  of  the  algorithm.  However, 
due  to  the  repeated  steering  of  the  process  by  lines  (vii)-(xvii) 
towards  reducing  the  value  of  the  optimization  criterion  (p,  we 
can  expect  that  the  output  of  the  superiorized  version  will  be 
superior  (from  the  point  of  view  of  tp)  to  the  output  of  the 
original  algorithm. 


II. F.  Information  on  performance  comparison 
with  MAP  methods 

Using  our  notation,  the  constrained  minimization  formula¬ 
tion  that  we  are  considering  is  as  follows:  Given  an  e  e  R+, 


minimize  <p(x),  subject  to  Vrj{x)  <  s.  (10) 


The  aim  of  superiorization  is  not  identical  with  the  aim  of 
constrained  minimization  in  Eq.  (10).  One  difference  is  that  s 
is  not  “given”  in  the  superiorization  context.  The  superioriza¬ 
tion  of  an  algorithm  produces  a  sequence  and,  for  any  e,  the 
associated  output  of  the  algorithm  is  considered  to  be  the  first 
x  in  the  sequence  for  which  'Prr(x)  <  s.  The  other  difference 
is  that  we  do  not  claim  that  this  output  is  a  minimizer  of  (p 
among  all  points  that  satisfy  the  constraint,  but  hope  only  that 
it  is  usually  an  x  for  which  <p(x)  is  at  the  small  end  of  its  range 
of  values  over  the  set  of  constraint-satisfying  points.  This  lat¬ 
ter  difference  is  generally  shared  by  comparisons  of  a  heuris¬ 
tic  approach  with  an  exact  approach  to  solving  a  constrained 
minimization  problem. 

The  MAP  (or  regularized)  formulation  of  a  physical  prob¬ 
lem  that  leads  to  the  constrained  minimization  problem  (10) 
is  the  unconstrained  minimization  problem  of  the  form:  Given 
a^el+, 

minimize  [ <p{x )  +  pVrT(x)].  (11) 

Formulations  of  both  kinds  [i.e.,  the  ones  of 
Eqs.  (10)  and  (11)]  are  widely  used  for  solving  medical 
physics  problems  and  the  question  “Which  of  these  two  for¬ 
mulations  leads  to  faster  or  better  solutions  of  the  underlying 
physical  problem?”  is  open.  Examples  of  both  formulations 
with  various  choices  for  VrT  and  (p  are  listed  in  the  beginning 
parts  of  the  paper  of  Goldstein  and  Osher.47 

We  now  return  to  the  question  raised  near  the  end  of 
Sec.  I:  Will  superiorization  produce  superior  results  to  those 
produced  by  contemporary  MAP  methods  or  is  it  faster  than 
the  better  of  such  methods?  As  yet,  there  is  very  little  informa¬ 
tion  available  regarding  this  general  question;  in  fact,  we  are 
aware  of  only  one  published  study.43  That  study  compared 
a  superiorization  algorithm  with  the  algorithm  of  Goldstein 
and  Osher  that  they  refer  to  as  TwIST  (Ref.  46)  with  split 
Bregman47  as  the  substep,  which  is  indeed  a  contemporary 
method  that  uses  the  MAP  formulation.  (For  example,  see  the 
discussion  of  the  split  Bregman  method  in  Ref.  56.)  The  prob¬ 
lem  S  to  which  the  two  algorithms  were  applied  was  one  from 
the  tomographic  problem  set  S  defined  in  Eq.  (1).  Ress  as  de¬ 
fined  in  Eq.  (2)  was  used  as  the  proximity  function  and  total 
variation,  TV  as  defined  below  in  Eq.  (12),  was  the  choice  for 
< p .  It  is  reported  in  Ref.  45  that  for  the  outputs  of  the  two  algo¬ 
rithms  that  were  being  compared,  the  values  of  Ress  and  TV 
were  very  similar,  but  the  superiorization  algorithm  produced 
its  output  four  times  faster  than  the  MAP  method. 


III.  AN  ILLUSTRATIVE  EXAMPLE 
III. A.  Application  to  tomography 

We  use  tomography  to  refer  to  the  process  of  reconstruct¬ 
ing  a  function  over  a  Euclidean  space  from  estimated  values 
of  its  integrals  along  lines  (that  are  usually,  but  not  necessar¬ 
ily,  straight).  The  particular  reconstruction  processes  to  which 
our  discussion  applies  are  the  series  expansion  methods ,  see 
Sec.  6.3  of  Ref.  55,  in  which  it  is  assumed  that  the  function 
to  be  reconstructed  can  be  approximated  by  a  linear  combi¬ 
nation  of  a  finite  number  (say  J)  of  basis  functions  and  the 
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reconstruction  task  becomes  one  of  estimating  the  coeffi¬ 
cients  of  the  basis  functions  in  the  expansion.  Sometimes, 
prior  knowledge  about  the  nature  of  the  function  to  be  recon¬ 
structed  allows  us  to  confine  the  sought-after  vector*  of  coef¬ 
ficients  to  a  subset  Q  of  JR7  (such  as  the  non-negative  orthant 
R7).  We  use  /  to  index  the  lines  along  which  we  integrate, 
a'  e  R7  to  denote  the  vector  whose  /'th  component  is  the  in¬ 
tegral  of  the  /th  basis  function  along  the  /th  line,  and  b,  to  de¬ 
note  the  measured  integral  of  the  function  to  be  reconstructed 
along  the  /th  line.  Under  these  circumstances  the  constraints 
come  from  the  desire  that,  for  each  of  the  lines,  ( a ' ,  x)  should 
be  close  (in  some  sense)  to  bt. 

To  make  this  concrete,  consider  Eq.  (1).  Such  a  descrip¬ 
tion  of  the  constraints  arises  in  tomography  by  grouping  the 
lines  of  integration  into  W  blocks,  with  £w  lines  in  the  u>th 
block.  Such  groupings  often  (but  not  always)  are  done  accord¬ 
ing  to  some  geometrical  condition  on  the  lines  (for  example, 
in  case  of  straight  lines,  we  may  decide  that  all  the  lines  that 
are  parallel  to  each  other  form  one  block).  In  this  framework, 
the  proximity  function  Res  defined  by  Eq.  (2)  provides  a  rea¬ 
sonable  measure  of  the  incompatibility  of  a  vector  *  with  the 
constraints.  The  algorithm  R  described  by  Eqs.  (3)-(5)  is  ap¬ 
plicable  to  this  concrete  formulation. 

There  are  many  optimization  criteria  that  have  been  used  in 
tomography,  see  Sec.  6.4  of  Ref.  55,  here  we  discuss  the  one 
called  TV,  whose  use  has  been  popular  in  medical  physics 
recently,  see  as  examples  Refs.  20,  22,  23,  and  41-44.  The 
definition  of  T  V  that  we  use  here  requires  a  certain  way  of 
selecting  the  basis  functions.  It  is  assumed  that  the  function  to 
be  reconstructed  is  defined  in  the  plane  R2  and  is  zero-valued 
outside  a  square-shaped  region  in  the  plane.  This  region  is 
subdivided  into  J  smaller  equal-sized  squares  ( pixels )  and  the 
J  basis  functions  are  defined  by  having  value  one  in  exactly 
one  pixel  and  value  zero  everywhere  else.  We  index  the  pixels 
by  j  and  we  let  C  denote  the  set  of  all  indices  of  pixels  that 
are  not  in  the  rightmost  column  or  the  bottom  row  of  the  pixel 
array.  For  any  pixel  with  index  j  in  C,  let  r(j)  and  b(j)  be  the 
index  of  the  pixel  to  its  right  and  below  it,  respectively.  We 
define  TV  :  R7  — ►  R  by 

TV(x)  =  -  xnj) )2  +  (xj  -  xb(j))2.  (12) 

feC 

The  method  we  adopted  to  generate  a  nonascending  vector 
for  the  TV  function  at  an  *  e  R7  is  based  on  Theorem  2  of 
the  Appendix.  It  is  applicable  since  TV  :  R7  — »•  R  is  a  con¬ 
vex  function;  see,  for  example,  the  end  of  the  Proof  of  Propo¬ 
sition  1  of  Ref.  41.  Now  consider  an  integer /  such  that  1  <  / 
<  J.  Looking  at  the  sum  in  Eq.  (12),  we  see  that  xy  appears 
in  at  most  three  terms,  in  which  /  must  be  either  j,  or  r(j),  or 
b(j)  for  some  j  e  C.  By  taking  the  formal  partial  derivatives  of 
these  three  terms,  we  see  that  j^-(x)  is  well  defined  if  the  de¬ 
nominator  in  the  formal  derivative  of  each  of  the  three  terms 
is  not  zero  for*.  In  view  of  this,  we  define  the  g  in  Theorem  2 
as  follows.  If  the  denominator  in  any  of  the  three  formal  par¬ 
tial  derivatives  with  respect  to  xy  has  an  absolute  value  less 
than  a  very  small  positive  number  (we  used  10  20 j,  then  we 
set  g j ’  to  zero,  otherwise  we  set  it  to  (*).  Clearly,  the  re¬ 


sulting  g  e  R7  satisfies  the  condition  in  Theorem  2  and  hence 
provides  a  d  that  is  a  nonascending  vector  for  TV  at  *. 

Previously  reported  reconstructions  using  7’ T-superior- 
ization  selected  the  d  using  subgradients  as  discussed  in  the 
paragraph  following  Eq.  (7);  such  a  d  is  not  guaranteed  to 
be  a  nonascending  vector  for  the  T  V  function.  What  we  are 
proposing  here  is  not  only  mathematically  rigorous  (in  the 
sense  that  it  is  guaranteed  to  produce  a  nonascending  vector 
for  the  TV  function),  but  it  can  also  lead  to  a  better  recon¬ 
structions,  as  illustrated  in  Subsection  III.D. 


III.B.  The  data  generation  for  the  experiments 

The  datasets  used  in  the  experiments  reported  in  this 
paper  were  generated  in  such  a  way  that  they  share  the 
noise-characteristics  of  CT  scanners  when  used  for  scanning 
the  human  head  and  brain;  as  discussed,  for  example,  in 
Chap.  5  of  Ref.  55.  They  were  generated  using  the  software 
SNARK09.57 

The  head  phantom  that  was  used  for  data  generation  is 
based  on  an  actual  cross  section  of  the  human  head.  It  is  de¬ 
scribed  as  a  collection  of  geometrical  objects  (such  as  ellipses, 
triangles,  and  segments  of  circles)  whose  combination  accu¬ 
rately  resembles  the  anatomical  features  of  the  actual  head 
cross  section.  In  addition,  the  basic  phantom  contains  a  large 
tumor.  The  actual  phantom  used  was  obtained  by  a  random 
variation  of  the  basic  phantom,  by  incorporating  into  it  lo¬ 
cal  inhomogeneities  and  small  low-contrast  tumors  at  ran¬ 
dom  locations.  This  phantom  is  represented  by  the  image  in 
Fig.  1(a).  That  image  comprises  485  x  485  pixels  each  of  size 
0.376  mm  by  0.376  mm.  The  values  assigned  to  the  pixels  are 
obtained  by  an  1 1  x  11  subsampling  of  the  pixels  and  aver¬ 
aging  the  values  assigned  to  the  subsamples  by  the  geomet¬ 
rical  objects  that  are  used  to  describe  the  anatomical  features 
and  the  tumors.  Those  values  are  approximate  linear  atten¬ 
uation  coefficients  per  cm  at  60  keV  (0.416  for  bone,  0.210 
for  brain,  0.207  for  cerebrospinal  fluid).  The  contrast  of  the 
small  tumors  with  their  background  is  0.003  cm7.  In  order 
to  clearly  see  the  low-contrast  details  in  the  interior  of  the 
skull,  we  use  zero  (black)  to  represent  the  value  0.204  (or  any¬ 
thing  less)  and  255  (white)  to  represent  0.21675  or  anything 
more).  The  same  is  true  for  all  the  images  in  the  rest  of  this 
paper. 

For  the  selected  head  phantom  we  generated  parallel 
projection  data,  in  which  one  view  comprises  estimates  of 
integrals  through  the  phantom  for  a  set  of  693  equally  spaced 
parallel  lines  with  a  spacing  of  0.0376  cm  between  them.  (We 
chose  to  simulate  parallel  rather  than  divergent  projection 
data,  since  the  reconstruction  by  the  method  of  Ref.  42  with 
which  we  wish  to  compare  the  superiorization  approach  was 
performed  for  us  by  the  authors  of  Ref.  42  on  parallel  data. 
Even  though  contemporary  CT  scanners  use  divergent  pro¬ 
jection  data,  results  obtained  by  the  use  of  parallel  projection 
data  are  relevant  to  them,  since  it  is  known  that  the  quality  of 
reconstructions  from  these  two  modes  of  data  collection  are 
very  similar  as  long  as  the  data  generations  use  similar  fre¬ 
quencies  of  sampling  of  lines  and  similar  noise  characteristics 
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Fig.  1 .  (a)  A  head  phantom,  (b)  Reconstruction  of  the  head  phantom  from  realistically  simulated  projection  data  for  360  views  using  ART  with  blob  basis 
functions. 


in  the  estimated  integrals  for  those  lines;  see,  for  example,  the 
reconstructions  from  divergent  and  parallel  projection  data  in 
Fig.  5.15  of  Ref.  55.)  In  calculating  these  estimates,  we  take 
into  consideration  the  effects  of  photon  statistics,  detector 
width,  and  scatter.  Details  of  how  we  do  this  exactly  can  be 
found  in  Secs.  5.5  and  5.9  of  Ref.  55.  Briefly,  quantum  noise 
is  calculated  based  on  the  assumption  that  approximately 
2  000  000  photons  enter  the  head  along  each  ray,  detector 
width  is  simulated  by  using  11  subrays  along  each  of 
which  the  attenuation  is  calculated  independently  and  then 
combined  at  the  detector,  and  5%  of  the  photons  get  counted 
not  by  the  detector  for  the  ray  in  question  but  detectors  for 
the  neighboring  rays.  For  the  experiments  in  this  paper,  we 
did  not  simulate  the  polyenergetic  nature  of  the  x-ray  source. 


To  indicate  what  can  be  achieved  in  clinical  CT,  we  show  in 
Fig.  1(b)  a  reconstruction  that  was  made  from  data  comprising 
of  360  such  views  with  the  reconstruction  algorithm  known 
as  ART  with  blob  basis  functions;  see  Chap.  1 1  of  Ref.  55. 

III.C.  Superiorization  reconstruction  from  a  few  views 

The  main  reason  in  the  literature  for  advocating  the  use  of 
TV  as  the  optimization  criterion  is  that  by  doing  so  one  can 
achieve  efficacious  reconstructions  even  from  sparsely  sam¬ 
pled  data.  In  our  own  work’1  with  realistically  simulated  CT 
data,  we  found  that  this  is  not  always  the  case  and  this  will  be 
demonstrated  again  by  the  experiments  reported  in  the  current 
paper. 


Fig.  2.  Reconstructions  using  TV  as  the  optimization  criterion  from  realistically  simulated  projection  data  for  60  views  using  (a)  ASD-POCS  and  (b)  supe¬ 
riorization.  As  compared  to  Fig.  1(b),  these  reconstructions  fail  in  two  ways:  they  do  not  show  some  of  the  fine  details  in  the  phantom  and  they  present  some 
artifactual  variations.  The  former  of  these  is  a  consequence  of  reconstructing  from  a  much  smaller  dataset  than  used  for  Fig.  1(b).  The  latter  is  due  to  using  a 
very  narrow  window  (13.5  HU)  in  these  displays.  Were  we  to  use  a  wider  display  window  (e.g.,  from  -429  HU  to  429  HU)  for  the  reconstructions  in  this  figure 
and  in  Fig.  1(b),  the  visual  appearance  of  the  resulting  images  would  be  nearly  indistinguishable. 
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There  have  appeared  in  the  literature  some  approaches  to 
T  V  minimization  that  seem  to  indicate  a  more  efficacious  per¬ 
formance  for  CT  than  the  one  reported  in  Ref.  31.  One  of 
these  is  the  adaptive  steepest  descent  projections  onto  convex 
sets  (ASD-POCS)  algorithm,  which  is  described  in  detail  in 
the  much-cited  paper  of  Sidky  and  Pan42  and  whose  use  has 
been  since  reported  in  a  number  of  subsequent  publications, 
for  example,  in  Refs.  23  and  43.  We  note  that  ASD-POCS 
was  designed  with  the  aim  of  producing  an  exact  minimiza¬ 
tion  algorithm,  in  contrast  to  our  heuristic  superiorization  ap¬ 
proach.  Translating  Eqs.  (6)-(8)  of  Ref.  42  into  our  termi¬ 
nology,  the  aim  of  ASD-POCS  is  the  following:  Given  an 
s  e  M+,  find  an  e-compatible  x  e  £2  =  R7  for  which  TV(x) 
is  minimal.  [Note  that  this  aim  is  a  special  case  of  the  con¬ 
strained  optimization  formulation  presented  in  Eq.  (10).]  In 
order  to  test  ASD-POCS,  we  generated  realistic  projection 
data  as  described  in  Subsection  III.B  but  for  only  60  views 
at  3°  increments  with  the  spacing  between  the  lines  for  which 
integrals  are  estimated  set  at  0.752  mm.  Thus  the  number  of 
rays  (and  hence  the  number  photons  put  into  the  head)  in  this 
dataset  is  a  12th  of  what  it  is  in  the  dataset  used  to  produce 
the  reconstruction  in  Fig.  1(b).  A  reconstruction  from  these 
data  was  produced  for  us  using  ASD-POCS  by  the  authors  of 
Ref.  42  (this  ensured  that  it  does  not  suffer  due  to  our  misinter¬ 
pretation  of  the  algorithm  or  from  our  inappropriate  choices 
of  the  free  parameters),  it  is  shown  in  Fig.  2(a). 

Since  the  image  quality  of  Fig.  2(a)  is  not  anywhere  near 
to  that  of  Fig.  1(b),  we  present  here  a  brief  discussion  as  to 
why  we  are  showing  such  images.  Many  publications  in  the 
recent  medical  imaging  literature  have  claimed  that  medically 
efficacious  reconstructions  can  be  obtained  by  the  use  of  TV - 
minimization  from  data  as  sparse  as  what  was  used  to  produce 
Fig.  2(a).  (In  fact,  ASD-POCS  was  motivated  and  used  with 
such  an  aim  in  mind.21-42'43)  Such  publications  usually  show 
reconstructions  from  sparse  data  as  evidence  for  the  validity 
of  their  claims.  They  can  do  this  because  in  their  presented 
illustrations  the  features  that  are  observable  in  the  reconstruc¬ 
tions  are  usually  much  larger  and/or  of  much  higher  contrast 
against  their  backgrounds  than  the  small  “tumors”  in  Fig.  1(a), 
which  are  perfectly  visible  in  the  reconstruction  in  Fig.  1(b), 
but  are  not  detectable  in  the  reconstruction  from  sparse  data 
in  Fig.  2(a).  The  reason  why  that  reconstruction  appears  to  be 
unacceptably  bad  is  that  the  display  window  (from  0.204  cnT1 
linear  attenuation  coefficient  to  0.21675  cnr1  linear  attenua¬ 
tion  coefficient)  is  very  narrow;  it  was  selected  to  enhance 
the  visibility  of  the  small  low-contrast  tumors.  The  width  of 
this  window  corresponds  to  about  13.5  Hounsfield  units  (HU). 
As  compared  to  this,  in  their  evaluation  of  sparse-view  recon¬ 
struction  from  flat -panel-detector  cone-beam  CT,  Bian  et  a/.43 
use  what  they  call  a  “soft-tissue  grayscale  window”  (also  a 
“narrow  window”)  from  — 429  HU  to  429  HU  to  display  head 
phantom  reconstructions.  Using  such  a  window  for  our  re¬ 
constructions  shown  Figs.  2(a)  and  1(b)  would  result  in  im¬ 
ages  that  are  nearly  indistinguishable  from  each  other.  Thus 
reporting  the  images  using  such  a  display  window  is  consis¬ 
tent  with  the  claim  that  a  TV-minimizing  reconstruction  from 
a  few  views  is  similar  in  quality  to  a  more  traditional  recon¬ 
struction  from  many  views.  However,  our  much  narrower  dis¬ 


play  window  reveals  that  this  is  not  really  so.  We  therefore 
continue  using  our  much  narrower  window  in  what  follows, 
since  it  clearly  reveals  the  nature  of  the  reconstructions  being 
compared,  warts  and  all. 

While  this  ASD-POCS  reconstruction  is  not  as  good  as  it 
should  be  for  diagnostic  CT  of  the  brain  (due  to  the  sparsity 
of  the  data),  it  is  visually  better  than  the  reconstruction  using 
superiorization  from  similar  data  as  reported  in  Ref.  3 1 .  We 
discuss  the  reasons  for  this  in  Subsection  III.D.  Here,  we  con¬ 
centrate  on  examining  whether  one  can  achieve  a  reconstruc¬ 
tion  using  superiorization  that  is  as  good  as  that  produced  by 
ASD-POCS  from  the  same  data. 

For  this  we  first  need  to  examine  the  numerical  properties 
of  the  ASD-POCS  reconstruction.  This  reconstruction  uses 
485  x  485  pixels  each  of  size  0.376  mm  by  0.376  mm.  This 
implies  that  J  —  235,225  and  it  also  determines  the  compo¬ 
nents  of  the  vectors  a1  e  R7  in  the  precise  specification  of 
the  problem  S.  The  Ress,  as  defined  by  Eq.  (2),  of  the  ASD- 
POCS  reconstruction  is  0.33  and  the  TV,  as  defined  by  Eq. 
(12),  is  835. 

We  applied  to  the  same  problem  S  a  superiorized  version 
of  the  algorithm  R  defined  by  Eq.  (3).  To  complete  the  spec¬ 
ification  of  R,  we  point  out  that  for  the  ordering  of  views  we 
chose  the  “efficient”  one  that  was  introduced  in  Ref.  58  and 
is  also  discussed  on  p.  209  of  Ref.  55.  The  choices  we  made 
for  the  superiorization  are  the  following:  yi  =  0.99995 f  ,  x 
is  the  zero  vector,  and  N  =  20.  The  nonascending  vector  was 
computed  by  the  method  described  in  the  paragraph  below 
[Eq.  (12)].  Denoting  by  Rs  the  infinite  sequence  of  points  in 
G  that  is  produced  by  the  superiorized  version  of  the  algo¬ 
rithm  R  when  applied  to  the  problem  S,  we  chose  as  our  re¬ 
construction  jc*  =  0(S,  0.33,  Rs).  For  such  a  reconstruction 
we  have,  by  the  definition  of  O,  that  Ress(x*)  <  0.33;  in  other 
words,  the  output  of  the  superiorization  algorithm  is  at  least 
as  constraints-compatible  with  S  as  the  output  of  ASD-POCS. 
From  the  point  of  view  of  TV -minimization,  our  x*  is  slightly 
better:  TV(x*)  =  826. 

The  superiorization  reconstruction  is  displayed  in 
Fig.  2(b).  Visually,  it  is  similar  to  the  reconstruction  produced 
by  ASD-POCS.  From  the  optimization  point  of  view  it 
achieves  the  desired  aim  better  than  ASD-POCS  does,  since 
it  results  in  smaller  values  for  both  Ress  and  for  TV,  even 
though  only  slightly. 

That  the  two  reconstructions  in  Fig.  2  are  very  similar  is 
not  surprising  because  a  comparison  of  the  pseudocodes  re¬ 
veals  that  the  ASD-POCS  algorithm  in  Ref.  42  is  essentially  a 
special  case  of  the  Superiorized  Version  of  Algorithm  P,  even 
though  it  has  been  derived  from  rather  different  principles.  To 
obtain  the  ASD-POCS  algorithm  from  our  methodology  de¬ 
scribed  here,  we  would  have  to  choose  ART  (see  Chap.  11 
of  Ref.  55)  as  the  algorithm  that  we  are  superiorizing.  Such 
a  superiorization  of  ART  was  reported  in  the  earliest  paper 
on  superiorization.27  For  the  illustration  in  our  current  paper, 
we  decided  to  superiorize  the  block-iterative  algorithm  R  de¬ 
fined  by  Eq.  (3).  This  illustrates  the  generality  of  the  superi¬ 
orization  approach:  it  is  applicable  not  only  to  a  large  class 
of  constrained  optimization  problems,  but  also  enables  the 
use  of  any  of  a  large  class  of  iterative  algorithms  designed  to 
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produce  a  constraints-compatible  solutions.  A  recent  publica¬ 
tion  aimed  at  producing  an  exact  TV-minimizing  algorithm 
based  on  the  block-iterative  approach  is  Ref.  44. 

III.D.  Effects  of  variations  in  the  reconstruction 
approach 

The  reconstruction  in  Fig.  2(a)  produced  by  ASD-POCS 
definitely  “looks  better”  than  a  reconstruction  in  Ref.  31, 
which  was  obtained  using  superiorization  from  similar  data. 
Since,  as  discussed  in  the  last  paragraph  of  Subsection  III.C, 
the  ASD-POCS  algorithm  in  Ref.  42  can  be  obtained  as  a  spe¬ 
cial  case  of  superiorization,  it  must  be  that  some  of  the  choices 
made  in  the  details  of  the  implementations  are  responsible  for 
the  visual  differences.  An  analysis  of  the  implementational 
details  adopted  by  the  two  approaches  revealed  several  differ¬ 
ences.  After  removing  these  differences,  the  superiorization 
approach  produced  the  image  in  Fig.  2(b),  which  is  very  sim¬ 
ilar  to  the  reconstruction  produced  by  ASD-POCS.  We  now 
list  the  implementational  choices  that  were  made  for  superi¬ 
orization  to  make  its  performance  match  that  of  the  reported 
implementation  of  ASD-POCS. 

One  implementational  difference  is  in  the  stopping-rule  of 
the  iterative  algorithm;  that  is,  the  choice  of  e  in  determin¬ 
ing  the  output  0(S,  e,  Rs )■  Since  the  data  are  noisy,  the  phan¬ 
tom  itself  does  not  match  the  data  exactly.  In  previously  re¬ 
ported  implementations  of  superiorization  it  was  assumed  that 
the  iterative  process  should  terminate  when  an  image  is  ob¬ 
tained  that  is  approximately  as  constraints-compatible  as  the 
phantom;  in  the  case  of  the  phantom  and  the  projections  data 
on  which  we  report  here  the  value  of  Ress  for  the  phantom 
is  approximately  0.91,  which  is  larger  than  its  value  (0.33) 
for  the  reconstruction  produced  by  ASD-POCS.  The  output 
0(S,  0.91,  Rs)  is  shown  in  Fig.  3(a).  This  is  a  wonderfully 
smooth  reconstruction,  its  TV  value  is  only  771.  However, 
this  smoothness  comes  at  a  price:  we  lose  not  only  the  abil¬ 
ity  to  detect  the  large  tumor,  but  we  cannot  even  see  anatomic 
features  (such  as  the  ventricular  cavities)  inside  the  brain.  So 
it  appears  that,  in  order  to  see  medically  relevant  features  in 
the  brain,  overfitting  (in  the  sense  of  producing  a  reconstruc¬ 
tion  from  noisy  data  that  is  more  constraints-compatible  than 
the  phantom)  is  desirable. 

In  the  implementations  that  produced  previously  reported 
reconstructions  by  superiorization,  the  number  N  in  the  Supe- 
riorized  Version  of  Algorithm  P  was  always  chosen  to  be  1 . 
It  is  possible  that  this  is  the  wrong  choice,  making  only  this 
change  to  what  lead  to  the  reconstruction  in  Fig.  2(b)  results 
in  the  reconstruction  shown  in  Fig.  3(b).  That  image  appears 
similar  to  the  image  in  Fig.  2(b),  but  it  has  a  higher  T  V  value, 
namely,  832,  which  is  still  very  slightly  lower  than  that  of  the 
ASD-POCS  reconstruction.  The  choice  N  =  20  was  based  on 
the  desire  to  maintain  consistency  with  what  has  been  prac¬ 
ticed  using  ASD-POCS,  see  p.  4790  of  Ref.  42.  It  appears  that 
in  the  context  of  our  paper  the  additional  computing  cost  due 
to  choosing  N  to  be  20  rather  than  1  is  not  really  justified.  (We 
note  that  if  d  is  selected  using  subgradients  as  discussed  in  the 
paragraph  following  Eq.  (7)  and  thus  d  is  not  guaranteed  to  be 
a  nonascending  vector  for  the  T  V  function,  then  the  choice  of 


20  rather  than  1  for  N  results  in  a  considerable  improvement. 
However,  an  even  greater  improvement  is  achieved  even  with 
N  —  1  by  selecting  d  as  recommended  in  this  paper.) 

Another  important  difference  between  the  ASD-POCS  im¬ 
plementation  and  the  previous  implementations  of  the  superi¬ 
orization  approach  is  the  size  of  the  pixels  in  the  reconstruc¬ 
tions.  For  the  ASD-POCS  reconstruction  this  was  selected  to 
be  0.376  mm  by  0.376  mm.  In  previously  reported  reconstruc¬ 
tions  by  superiorization  it  was  assumed  that  the  edge  of  a 
pixel  should  be  the  same  as  the  distance  between  the  paral¬ 
lel  lines  along  which  the  data  are  collected;  that  is,  0.752  mm 
for  our  problem  S.  This  assumption  proved  to  be  false.  TV - 
minimization  takes  care  of  undesirable  artifacts  that  may  oth¬ 
erwise  arise  due  to  the  smaller  pixels  and  this  leads  to  a  visual 
improvement.  A  superiorizing  reconstruction  with  the  larger 
pixels,  using  e  =  0.33  and  N  =  20,  is  shown  in  Fig.  3(c). 
(We  note  that  the  use  of  smaller  pixels  during  iterative  x-ray 
CT  reconstructions  was  also  suggested  in  Ref.  59.  However, 
that  approach  is  quite  different  from  what  is  presented  here: 
its  final  result  uses  larger  pixels  whose  values  are  obtained  by 
averaging  assemblies  of  values  provided  by  the  iterative  pro¬ 
cess  to  the  smaller  pixels.  There  is  no  such  downsampling  in 
our  approach,  our  final  result  is  presented  using  the  smaller 
pixels.  Its  smoothness  is  due  to  reduction  of  7Y  by  the  supe¬ 
riorization  approach  rather  than  to  averaging  pixel  values  in  a 
denser  digitization.) 

Combining  the  use  of  the  larger  pixels  with  e  =  0.91  and 
N  —  1  results  in  the  reconstruction  shown  in  Fig.  3(d).  This 
reconstruction,  for  which  the  superiorization  options  were  se¬ 
lected  according  to  what  was  done  in  Ref.  31,  is  visually 
inferior  to  those  shown  in  our  Fig.  2.  The  reconstructions 
displayed  in  Fig.  3  also  illustrate  another  important  point, 
namely,  that  even  though  the  mathematical  results  discussed 
in  this  paper  are  valid  for  a  large  range  of  choices  of  the  pa¬ 
rameters  in  the  superiorization  algorithms,  for  medical  effi¬ 
cacy  of  the  reconstructions  attention  has  to  be  paid  to  these 
choices  since  they  can  have  a  drastic  effect  on  the  quality  of 
the  reconstruction. 

It  has  been  mentioned  in  Subsection  II. B  that  except  for 
the  presence  of  Q  in  Eq.  (3),  which  enforces  non-negativity 
of  the  components,  R  is  identical  to  the  algorithm  used  and 
illustrated  in  Ref.  31.  It  is  known  that  CT  reconstruction  of 
the  brain  from  many  views  does  not  suffer  from  ignoring 
the  fact  that  the  components  of  the  x,  which  represent  linear 
attenuation  coefficients,  should  be  non-negative;  as  is  illus¬ 
trated  in  Fig.  1(b).  This  remains  so  when  reconstructing  from 
a  few  views  using  the  method  and  data  that  we  have  been  dis¬ 
cussing:  if  we  do  everything  in  exactly  the  same  way  as  was 
done  to  obtain  the  reconstruction  with  TV  value  826  that  is 
shown  in  our  Fig.  2(b)  but  remove  Q  from  Eq.  (3),  then  we 
obtain  a  reconstruction  in  Fig.  4(a)  whose  TV  value  is  829. 

Another  variation  that  deserves  discussion,  because  it  has 
been  suggested  in  the  literature,22  is  one  that  does  not  come 
about  by  making  choices  for  the  general  approach  of  the  Su- 
periorized  Version  of  Algorithm  P  but  rather  by  changing  the 
nature  of  the  approach.  The  variation  in  question  is  not  appli¬ 
cable  in  general,  but  can  be  applied  to  the  special  case  when 
the  algorithm  to  be  superiorized  is  the  R  defined  by  Eq.  (3).  It 
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Fig.  3.  Reconstructions  produced  by  varying  some  of  the  parameters  in  the  algorithm  that  produced  Fig.  2(b).  (a)  Changing  the  termination  criterion  form 
s  =  0.33  to  s  =  0.91.  (b)  Changing  the  value  of  N  from  20  to  1.  (c)  Reconstructing  with  pixel  size  0.752  mm  by  0.752  mm  instead  of  0.376  mm  by  0.376  mm. 
(d)  Reconstructing  with  all  the  three  changes  of  (a)-(c). 


Fig.  4.  Reconstructions  by  variations  that  do  not  fit  into  the  framework  within  which  the  previously  shown  reconstructions  were  produced,  (a)  Not  using 
non-negativity  in  the  algorithm,  (b)  Interleaving  perturbations  with  blocks. 
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was  suggested  as  an  improvement  to  the  approach  presented 
above  with  the  choice  N  —  1.  The  idea  was  based  on  recog¬ 
nizing  the  block-iterative  nature  of  the  algorithmic  operator 
Rs  in  Eq.  (3)  and  intermingling  the  perturbation  steps  of  lines 
(vii)-(xvii)  of  the  Superiorized  Version  of  Algorithm  R  with 
the  projection  steps  B  v, ,  . . . ,  of  Eq.  (3).  It  was  reported 
in  Ref.  22  that  doing  this  is  advantageous  to  using  the  Supe¬ 
riorized  Version  of  Algorithm  R.  However,  when  we  applied 
the  variation  of  the  Superiorized  Version  of  Algorithm  R  that 
is  proposed  in  Ref.  22  to  the  problem  S  that  we  have  been 
using  in  this  section,  we  ended  up  with  the  reconstruction  in 
Fig.  4(b)  whose  TV  value  is  920.  This  is  not  as  good  as  what 
was  obtained  using  the  version  of  the  algorithm  that  produced 
the  reconstruction  in  Fig.  2(b).  We  conclude  that  the  variation 
suggested  by  Ref.  22,  which  does  not  fit  into  the  theory  of  our 
paper,  does  not  have  an  advantage  over  what  we  are  proposing 
here,  at  least  for  the  problem  S  that  we  have  been  discussing  in 
this  section.  We  conjecture  that  the  improvement  reported  in 
Ref.  22  is  due  to  selecting  d  using  subgradients  as  discussed 
in  the  paragraph  following  Eq.  (7)  and,  as  discussed  earlier, 
such  an  improvement  is  not  obtained  if  d  is  selected  by  the 
more  appropriate  method  recommended  in  this  paper. 


IV.  DISCUSSION  AND  CONCLUSIONS 

Constrained  optimization  is  an  often-used  tool  in  medical 
physics.  The  methodology  of  superiorization  is  a  heuristic  (as 
opposed  to  exact)  approach  to  constrained  optimization. 

Although  the  idea  of  superiorization  was  introduced  in 
2007  and  its  practical  use  has  been  demonstrated  in  several 
publications  since,  this  paper  is  the  first  to  provide  a  solid 
mathematical  foundation  to  superiorization  as  applied  to  the 
noisy  problems  of  the  real  world.  These  foundations  include  a 
precise  definition  of  constraints-compatibility,  the  concept  of 
a  strongly  perturbation  resilient  algorithm,  simple  conditions 
that  ensure  that  an  algorithm  is  strongly  perturbation  resilient, 
the  superiorized  version  of  an  algorithm  and  the  showing  that 
the  superiorized  version  of  a  strongly  perturbation  resilient 
algorithm  produces  outputs  that  are  essentially  as  constraints- 
compatible  as  those  produced  by  the  original  version  but  are 
likely  to  have  a  smaller  value  of  the  chosen  optimization  cri¬ 
terion. 

The  approach  is  very  general.  For  any  iterative  algorithm 
P  and  for  any  optimization  criterion  0  for  which  we  know 
how  to  produce  nonascending  vectors,  the  pseudocode  given 
in  Subsection  II. E  automatically  provides  the  version  of  P  that 
is  superiorized  for  0. 

We  demonstrated  superiorization  for  tomography  when  to¬ 
tal  variation  is  used  as  the  optimization  criterion.  In  particu¬ 
lar,  we  illustrated  on  a  particular  tomography  problem  that,  in 
spite  of  its  generality,  superiorization  produced  a  reconstruc¬ 
tion  that  is  as  good  as  (from  the  points  of  view  of  constraints- 
compatibility  and  T  V -minimization)  what  was  obtained  by 
the  ASD-POCS  algorithm  that  was  specially  designed  for 
T  V -minimization  in  tomography. 
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APPENDIX:  MATHEMATICAL  PROOFS 
1.  Conditions  for  strong  perturbation  resilience 

Theorem  1.  Let  P  be  an  algorithm  for  a  problem  structure 
(T,  Vr)  such  that,  for  all  T  e  T,  P  is  boundedly  convergent 
for  T,  VrT  :  Q  — >  E  is  uniformly  continuous,  and  P7  :  A 
— »•  £2  is  nonexpansive.  Then  P  is  strongly  perturbation  re¬ 
silient. 

Proof.  We  first  show  that  there  exists  an  s  e  R+  such 
that  0(T,  s,  ((P7')iJc)^=0)  is  defined  for  every  x  e  £2.  Un¬ 
der  the  assumptions  of  the  theorem,  let  y  e  M  +  be  such 
that  Vrr(y(x))  <  y,  for  every  x  e  Q.  We  prove  that 
0(T,  2 y,  ((Pr)kx)£L0)  is  defined  for  every  x  e  £2  as  follows. 
Select  a  particular  x  e  £2.  By  uniform  continuity  of  VrT, 
there  exists  a  8  >  0,  such  that  \Vrr(z)  —  VrT(y(x))\  <  y, 
for  any  z  e  £2  for  which  ||z  —  y(x)||  <  S.  Since  P  is  conver¬ 
gent  for  T,  there  exists  a  non-negative  integer  K .  such  that 
HlPr)**  —  j(jc)||  <  8.  It  follows  that 

\VrT((PT)Kx)\  <  \VrT((PT)Kx)-VrT(y(x))\  +  \VrT(y(x))\ 

<  2 y.  (Al) 

Now  let  leT  and  s  e  R+  be  such  that  0(T,  e, 
((Pr)k x)f={])  is  defined  for  every  x  e  £2.  To  prove  the  theo¬ 
rem,  we  need  to  show  that  0(T,  s',  R)  is  defined  for  every  s' 
>  e  and  for  every  sequence  R  —  (xk )fi(]  of  points  in  £2  for 
which,  for  all  k  >  0,  Eq.  (6)  is  satisfied  for  bounded  perturba¬ 
tions  fikvk  ■  Let  s'  and  R  satisfy  the  conditions  of  the  previous 
sentence. 

For  k  >  0,  we  have,  due  to  the  nonexpansiveness  of  P7, 
that 

\\xk+l  -  PTxkW  =  \\PT(xk  +  pkvk)  -PTxk\\  <  \\pkvk\\. 

(A2) 

Denote  ||/J*  vk  ||  by  rk.  Clearly,  rk  e  R+  and  it  follows  from  the 

EOO 

^  r*  <  oo. 

We  next  prove  by  induction  that,  for  every  pair  of  non¬ 
negative  integers  k  and  /, 

k+i- 1 

\\xk+i -(PTyxk\\  <  ^2  rj.  (A3) 

i=k 
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Let  k  be  an  arbitrary  non-negative  integer.  If  i  =  0,  then 
the  value  is  zero  on  both  sides  of  the  inequality  and  hence 
Eq.  (A3)  holds.  Now  assume  that  Eq.  (A3)  holds  for  an  integer 
i  >  0.  Then,  by  Eq.  (A2)  and  the  nonexpansiveness  of  Py, 

||xi'+i+1  -  (Pr)i+1*<:||  <  ||jc*+/+1  -Pr;c*+,'|| 

+  \\PTxk+i  -  (PT-y+Vll 

<  rk+i  +  \\xk+i  -  (Pr)*'jc*|| 

k+i- 1 

<  rk+i  +  E  n 

j=k 

k+i 

=  Er/’  (A4> 

j=k 

which  completes  our  inductive  proof.  A  consequence  of 
Eq.  (A3)  is  that,  for  every  pair  of  non-negative  integers  k  and 

1, 

OO 

||x*+i  -  (Pr)!x*||  <  E  ri  •  (A5) 

j=k 

Due  to  the  summability  of  the  non-negative  sequence 
(ri t)£o-  right-hand  side  (and  hence  the  left-hand  side)  of 
this  inequality  gets  arbitrarily  close  to  zero  as  k  increases. 

Since  Vrr  is  uniformly  continuous,  there  exists  a  S 
such  that,  for  all  x,  y  e  £2,  [Prjtx)  —  VrT(y)\  <  s'  —  s  pro¬ 
vided  that  H*  —  y||  <  5.  Select  a  k  so  that  YlJLic  rj  —  By 
the  assumption  that  OPT ,  s,  ((P-r)kx  )f=())  is  defined  for  ev¬ 
ery  *  e  £2,  there  exists  a  non-negative  integer  i  for  which 
Vr((PTyxk)  <  e.  From  Eq.  (A5)  we  have,  for  this  k  and  i, 
that  \\xk+‘  —  (Pj’)'**r||  <  8  and,  hence, 

\VrT(xk+i)\  <  \VrT(xk+i)-VrT((PTyxk)\ 
+\VrT((PTyxk)\ 

<  (s'  —  s)  +  s  =  s',  (A6) 

proving  that  0(T,  s',  R)  is  defined.  □ 

2.  Nonascending  vectors  for  convex  functions 

Theorem  2:  Let  </>  :  R7  ->  R  be  a  convex  function  and  let 
*  e  R 7 .  Let  g  e  R7  satisfy  the  property:  For  1  <j<J,  if  the 
jth  component  gj  of  g  is  not  zero,  then  the  partial  derivative 
-y-t*  )  of  (p  at  *  exists  and  its  value  is  gj.  Define  d  to  be  the 
zero  vector  if  ||g||  =  0  and  to  be  —  g/||g'||  otherwise.  Then  d 
is  a  nonascending  vector  for  (p  at  *. 

Proof:  The  theorem  is  trivially  true  if  ||g-||  =  0,  so  we  as¬ 
sume  that  this  is  not  the  case.  We  denote  by  I  the  nonempty 
set  of  those  indices  j  for  which  gj  f-  0. 

For  1  <  /  <  ./,  let  Sj  be  gj/\gj\  for  j  e  I  and  be  0  otherwise, 
and  let  e1  e  R7  be  the  vector  all  of  whose  components  are 
zero  except  for  the  jth,  which  is  one.  Then,  for  1  <  j  <  J, 
there  exists  a  Sj  >  0  such  that,  for  0  <  /,;  <  Sj, 

<p(x  —  XjSjej)  <  <p(x).  (A7) 

This  is  obvious  if  sj  =  0.  Otherwise,  j^-(x)  exists  and  in¬ 
dicates  f  increases  at  *  if  Sj  =  1  or  that  (p  decreases  at  *  if  sj 
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=  —  1.  The  existence  of  the  desired  Sj  can  be  derived  from  the 
standard  definition  of  the  partial  derivative  as  a  limit. 

We  define  S  >  0  by 


S  = 


min 


J  jzi 


I  Sj 


Then  we  have  that,  for  0  <  X  <  S, 

j 


(p  (x  +  xd )  —  (p  lx  —  X^Y 


S:eJ 


i=  i 


£11 


j= i 


7=1 

J 


S;eJ 


<  -  y<p(*) 

7=1 

=  </>(*)■ 


(A8) 


(A9) 


The  first  inequality  above  follows  from  the  convexity  of  </> 
and  the  second  one  follows  from  Eq.  (A7),  with  Xj  defined  to 
be  combined  with  Eq.  (A8).  Thus  d  is  a  nonascending 

vector  for  0  at  x.  □ 


a) Author  to  whom  correspondence  should  be  addressed.  Electronic 
mail:  gaboitherman@yahoo.com;  URL:  http://www.dig.cs.gc.cuny.edu/ 
gabor/index  .html . 

1  J.  O.  Deasy,  “Multiple  local  minima  in  radiotherapy  optimization  problems 
with  dose-volume  constraints,”  Med.  Phys.  24,  1 157-1161  (1997). 

2G.  A.  Ezzell,  “Genetic  and  geometric  optimization  of  three-dimensional 
radiation  therapy  treatment  planning,”  Med.  Phys.  23,  293-305  (1996). 

3  A.  Gustafsson,  B.  K.  Lind,  and  A.  Brahme,  “A  generalized  pencil  beam 
algorithm  for  optimization  of  radiation-therapy,”  Med.  Phys.  21,  343-357 
(1994). 

4  A.  Gustafsson,  B.  K.  Lind,  R.  Svensson,  and  A.  Brahme,  “Simultaneous- 
optimization  of  dynamic  multileaf  collimation  and  scanning  patterns  or 
compensation  filters  using  a  generalized  pencil  beam  algorithm,”  Med. 
Phys.  22,  1141-1156(1995). 

5E.  Lessard  and  J.  Pouliot,  “Inverse  planning  anatomy-based  dose  opti¬ 
mization  for  hdr-brachytherapy  of  the  prostate  using  fast  simulated  anneal¬ 
ing  algorithm  and  dedicated  objective  function,”  Med.  Phys.  28,  773-779 
(2001). 

6R.  Manzke,  M.  Grass,  T.  Nielsen,  G.  Shechter,  and  D.  Hawkes,  “Adaptive 
temporal  resolution  optimization  in  helical  cardiac  cone  beam  CT  recon¬ 
struction,”  Med.  Phys.  30,  3072-3080  (2003). 

7  A.  B.  Pugachev,  A.  L.  Boyer,  and  L.  Xing,  “Beam  orientation  optimiza¬ 
tion  in  intensity-modulated  radiation  treatment  planning,”  Med.  Phys.  27, 
1238-1245  (2000). 

8D.  M.  Shepard,  M.  A.  Earl,  X.  A.  Li,  S.  Naqvi,  and  C.  Yu,  “Direct  aperture 
optimization:  A  turnkey  solution  for  step-and-shoot  IMRT,”  Med.  Phys.  29, 
1007-1018  (2002). 

9C.  Studholme,  D.  L.  G.  Hill,  and  D.  J.  Hawkes,  “Automated  three- 
dimensional  registration  of  magnetic  resonance  and  positron  emission  to¬ 
mography  brain  images  by  multiresolution  optimization  of  voxel  similarity 
measures,”  Med.  Phys.  24,  25-35  (1997). 

10Q.  W.  Wu  and  R.  Mohan,  “Algorithms  and  functionality  of  an  intensity 
modulated  radiotherapy  optimization  system,”  Med.  Phys.  27,  701-711 
(2000). 

11 Y.  Yu  and  M.  C.  Schell,  “A  genetic  algorithm  for  the  optimization  of 
prostate  implants,”  Med.  Phys.  23,  2085-2091  (1996). 


5546 


Herman  et  a!.:  Superiorization:  An  optimization  heuristic  for  medical  physics 


5546 


12T.  Z.  Zhang,  R.  Jeraj,  H.  Keller,  W.  G.  Lu,  G.  H.  Olivera,  T.  R.  McNutt, 
T.  R.  Mackie,  and  B.  Paliwal,  “Treatment  plan  optimization  incorporating 
respiratory  motion,”  Med.  Phys.  31,  1576-1586  (2004). 

13M.  Abdoli,  M.  R.  Ay,  A.  Ahmadian,  R.  A.  Dierckx,  and  H.  Zaidi,  “Reduc¬ 
tion  of  dental  filling  metallic  artifacts  in  CT-based  attenuation  correction 
of  PET  data  using  weighted  virtual  sinograms  optimized  by  a  genetic  algo¬ 
rithm,”  Med.  Phys.  37,  6166-6177  (2010). 

14S.  Bartolac,  S.  Graham,  J.  Siewerdsen,  and  D.  Jaffray,  “Fluence  field  op¬ 
timization  for  noise  and  dose  objectives  in  CT,”  Med.  Phys.  38,  S2-S17 
(2011). 

15W.  Chen,  D.  Craft,  T.  M.  Madden,  K.  Zhang,  H.  M.  Kooy,  and  G.  T.  Her¬ 
man,  “A  fast  optimization  algorithm  for  multicriteria  intensity  modulated 
proton  therapy  planning,”  Med.  Phys.  37,  4938-4945  (2010). 

16 J.  Fiege,  B.  McCurdy,  P.  Potrebko,  H.  Champion,  and  A.  Cull,  “PARETO: 
A  novel  evolutionary  optimization  approach  to  multiobjective  IMRT  plan¬ 
ning,”  Med.  Phys.  38,  5217-5229  (201 1). 

17 A.  Fredriksson,  A.  Forsgren,  and  B.  Hardemark,  “Minimax  optimization 
for  handling  range  and  setup  uncertainties  in  proton  therapy,”  Med.  Phys. 
38,  1672-1684(2011). 

18C.  Holdsworth,  M.  Kim,  J.  Liao,  and  M.  H.  Phillips,  “A  hierarchical  evo¬ 
lutionary  algorithm  for  multiobjective  optimization  in  IMRT,”  Med.  Phys. 
37,  4986-4997  (2010). 

19C.  Holdsworth,  R.  D.  Stewart,  M.  Kim,  J.  Liao,  and  M.  H.  Phillips,  “In¬ 
vestigation  of  effective  decision  criteria  for  multiobjective  optimization  in 
IMRT,”  Med.  Phys.  38,  2964-2974  (201 1). 

20T.  Kim,  L.  Zhu,  T.-S.  Suh,  S.  Geneser,  B.  Meng,  and  L.  Xing,  “Inverse  plan¬ 
ning  for  IMRT  with  nonuniform  beam  profiles  using  total-variation  regu¬ 
larization  (TVR) ,”  Med.  Phys.  38,  57-66  (2011). 

21 C.  Men,  H.  E.  Romeijn,  X.  Jia,  and  S.  B.  Jiang,  “Ultrafast  treatment  plan 
optimization  for  volumetric  modulated  arc  therapy  (VMAT),”  Med.  Phys. 

37,  5787-5791  (2010). 

22  S.  N.  Penfold,  R.  W.  Schulte,  Y.  Censor,  and  A.  B.  Rosenfeld,  “Total  vari¬ 
ation  superiorization  schemes  in  proton  computed  tomography  image  re¬ 
construction,”  Med.  Phys.  37,  5887-5895  (2010). 

23 E.  Y.  Sidky,  Y.  Duchin,  X.  Pan,  and  C.  Ullberg,  “A  constrained,  total- 
variation  minimization  algorithm  for  low-intensity  x-ray  CT,”  Med.  Phys. 

38,  S 1 17— S 125  (2011). 

24H.  Stabenau,  L.  Rivera,  E.  Yorke,  J.  Yang,  R.  Lu,  R.  J.  Radke,  and  A.  Jack- 
son,  “Reduced  order  constrained  optimization  (ROCO):  Clinical  applica¬ 
tion  to  lung  IMRT,”  Med.  Phys.  38,  2731-2741  (2011). 

25  Y.  Yang  and  M.  J.  Rivard,  “Dosimetric  optimization  of  a  conical  breast 
brachytherapy  applicator  for  improved  skin  dose  sparing,”  Med.  Phys.  37, 
5665-5671  (2010). 

26X.  Zhang,  J.  Wang,  and  L.  Xing,  “Metal  artifact  reduction  in  x-ray  com¬ 
puted  tomography  (CT)  by  constrained  optimization,”  Med.  Phys.  38,  701- 
711(2011). 

27  D.  Butnariu,  R.  Davidi,  G.  T.  Herman,  and  I.  G.  Kazantsev,  “Stable  con¬ 
vergence  behavior  under  summable  perturbations  of  a  class  of  projection 
methods  for  convex  feasibility  and  optimization  problems,”  IEEE  J.  Sel. 
Top.  Signal  Process.  1,  540-547  (2007). 

28  R.  Davidi,  G.  T.  Herman,  and  Y.  Censor,  “Perturbation-resilient  block- 
iterative  projection  methods  with  application  to  image  reconstruction  from 
projections,”  Int.  Trans.  Oper.  Res.  16,  505-524  (2009). 

29Y.  Censor,  R.  Davidi,  and  G.  T.  Herman,  “Perturbation  resilience  and  supe¬ 
riorization  of  iterative  algorithms,”  Inverse  Probl.  26,  065008  (2010). 

30T.  Nikazad,  R.  Davidi,  and  G.  T.  Herman,  “Accelerated  perturbation- 
resilient  block-iterative  projection  methods  with  application  to  image  re¬ 
construction,”  Inverse  Probl.  28,  035005  (2012). 

31G.  T.  Herman  and  R.  Davidi,  “Image  reconstruction  from  a  small  number 
of  projections,”  Inverse  Probl.  24,  04501 1  (2008). 

32E.  Garduno,  R.  Davidi,  and  G.  T.  Herman,  “Reconstruction  from  a  few 
projections  by  i\ -minimization  of  the  Haar  transform,”  Inverse  Probl.  27, 
055006  (2011). 

33  R.  L.  Rardin  and  R.  Uzsoy,  “Experimental  evaluation  of  heuristic  optimiza¬ 
tion  algorithms:  A  tutorial,”  J.  Heuristics  7,  261-304  (2001). 

i4L.  Wernisch,  S.  Hery,  and  S.  J.  Wodak,  “Automatic  protein  design  with  all 
atom  force-fields  by  exact  and  heuristic  optimization,”  J.  Mol.  Biol.  301, 
713-736  (2000). 

35 S.  H.  Zanakis  and  J.  R.  Evans,  “Heuristic  optimization:  Why,  when,  and 
how  to  use  it,”  Interfaces  11,  84-91  (1981). 


36  G.  T.  Herman  and  W.  Chen,  “A  fast  algorithm  for  solving  a  linear  feasibility 
problem  with  application  to  intensity-modulated  radiation  therapy,”  Linear 
Algebra  Appl.  428,  1207-1217  (2008). 

37  E.  S.  Helou  Neto  and  A.  R.  De  Pierro,  “Incremental  subgradients  for  con¬ 
strained  convex  optimization:  A  unified  framework  and  new  methods,” 
SIAM  J.  Optim.  20,  1547-1572  (2009). 

38  E.  S.  Helou  Neto  and  A.  R.  De  Pierro,  “On  perturbed  steepest  descent  meth¬ 
ods  with  inexact  line  search  for  bilevel  convex  optimization,”  Optim.  60, 
991-1008  (2011). 

39E.  A.  Nurminski,  “Envelope  stepsize  control  for  iterative  algorithms  based 
on  Fejer  processes  with  attractants,”  Optim.  Methods  Software  25,  97-108 
(2010). 

40P.  L.  Combettes  and  J.  Luo,  “An  adaptive  level  set  method  for  nondifferen- 
tiable  constrained  image  recovery,”  IEEE  Trans.  Image  Process.  11,  1295- 
1304  (2002). 

41 P.  L.  Combettes  and  J.-C.  Pesquet,  “Image  restoration  subject  to  a  total 
variation  constraint,”  IEEE  Trans.  Image  Process.  13,  1213-1222  (2004). 

42  E.  Y.  Sidky  and  X.  Pan,  “Image  reconstruction  in  circular  cone-beam 
computed  tomography  by  constrained,  total-variation  minimization,”  Phys. 
Med.  Biol.  53,  4777^1807  (2008). 

43 J.  Bian,  J.  H.  Siewerdsen,  X.  Han,  E.  Y.  Sidky,  J.  L.  Prince,  C.  A.  Peliz¬ 
zari,  and  X.  Pan,  “Evaluation  of  sparse-view  reconstruction  from  flat-panel- 
detector  cone-beam  CT,”  Phys.  Med.  Biol.  55,  6575-6599  (2010). 

44M.  Defrise,  C.  Vanhove,  and  X.  Liu,  “An  algorithm  for  total  variation  regu¬ 
larization  in  high-dimensional  linear  problems,”  Inverse  Probl.  27,  065002 
(2011). 

45  Y.  Censor,  W.  Chen,  P.  L.  Combettes,  R.  Davidi,  and  G.  T.  Herman,  “On 
the  effectiveness  of  projection  methods  for  convex  feasibility  problems 
with  linear  inequality  constraints,”  Comput.  Optim.  Appl.  51,  1065-1088 
(2012). 

46J.  Bioucas-Dias  and  M.  Figueiredo,  “A  new  TwIST:  Two-step  iterative 
shrinkage/thresholding  algorithms  for  image  restoration,”  IEEE  Trans.  Im¬ 
age  Process.  16,  2992-3004  (2007). 

47 T.  Goldstein  and  S.  Osher,  “The  split  Bregman  method  for  LI  regularized 
problems,”  SIAM  J.  Imaging  Sci.  2,  323-343  (2009). 

48  L.  A.  Shepp  and  Y.  Vardi,  “Maximum  likelihood  reconstruction  for  emis¬ 
sion  tomography,”  IEEE  Trans.  Med.  Imaging  1,  113-122  (1982). 

49 E.  Levitan  and  G.  T.  Herman,  “A  maximum  a  posteriori  probability  ex¬ 
pectation  maximization  algorithm  for  image  reconstruction  in  emission  to¬ 
mography,”  IEEE  Trans.  Med.  Imaging  6,  185-192  (1987). 

50W.  Jin,  Y.  Censor,  and  M.  Jiang,  “A  heuristic  superiorization-like  ap¬ 
proach  to  bioluminescence  tomography,”  in  Proceedings  of  the  Inter¬ 
national  Federation  for  Medical  and  Biological  Engineering  (IFMBE) 
(Springer- Verlag,  Berlin,  2012),  Vol.  39,  pp.  1026-1029. 

51H.  M.  Hudson  and  R.  S.  Larkin,  “Accelerated  image  reconstruction  using 
ordered  subsets  of  projection  data,”  IEEE  Trans.  Med.  Imaging  13,  601- 
609  (1994). 

52T.  Elfving,  “Block-iterative  methods  for  consistent  and  inconsistent  linear 
equations,”  Numer.  Math.  35,  1-12  (1980). 

53  P.  P.  B.  Eggermont,  G.  T.  Herman,  and  A.  Lent,  “Iterative  algorithms  for 
large  partitioned  linear  systems,  with  applications  to  image  reconstruction,” 
Linear  Algebra  Appl.  40,  37-67  (1981). 

54R.  Aharoni  and  Y.  Censor,  “Block- iterative  projection  methods  for  parallel 
computation  of  solutions  to  convex  feasibility  problems,”  Linear  Algebra 
Appl.  120,  165-175  (1989). 

55  G.  T.  Herman,  Fundamentals  of  Computerized  Tomography:  Image  Recon¬ 
struction  from  Projections,  2nd  ed.  (Springer,  New  York,  2009). 

56  J.  F.  P.  J.  Abascal,  J.  Chamorro- Servent,  J.  Aguirre,  S.  Arridge,  T.  Correia, 
J.  Ripoli,  J.  J.  Vaquero,  and  M.  Desco,  “Fluorescence  diffuse  optical  to¬ 
mography  using  the  split  Bregman  method,”  Med.  Phys.  38,  6275-6284 
(2011). 

57R.  Davidi,  G.  T.  Herman,  and  J.  Klukowska,  SNARK09:  A  programming 
system  for  the  reconstruction  of  2D  images  from  ID  projections,  2009 
(available  URL:  http://www.snark09.com). 

58  G.  T.  Herman  and  L.  B.  Meyer,  “Algebraic  reconstruction  techniques  can 
be  made  computationally  efficient,”  IEEE  Trans.  Med.  Imaging  12,  600- 
609  (1993). 

59  W.  Zbijewski  and  F.  J.  Beekman,  “Characterization  and  suppression  of  edge 
and  aliasing  artefacts  in  iterative  x-ray  CT  reconstruction,”  Phys.  Med. 
Biol.  49,  145-157  (2004). 


Medical  Physics,  Vol.  39,  No.  9,  September  2012 


J  Optim Theory Appl 

DOI  10.1007/S10957-013-0408-3 


Projected  Subgradient  M  inimization  Versus 
Superiorization 


Yair  Censor  •  Ran  Davidi  •  Gabor  T.  Herman  ■ 
Reinhard  W.  Schulte  -  Luba  Tetruashvili 


Received:  5  February  2013  /  Accepted:  17  August  2013 
©  Springer  Science+Business  M  edia  New  York  2013 


Abstract  The  projected  subgradient  method  for  constrained  minimization  repeatedly 
interlaces  subgradient  steps  for  the  objective  function  with  projections  onto  the  fea¬ 
sible  region,  which  is  the  intersection  of  closed  and  convex  constraints  sets,  to  regain 
feasibility.  The  latter  poses  a  computational  difficulty,  and,  therefore,  the  projected 
subgradient  method  is  applicable  only  when  the  feasible  region  is  "simple  to  project 
onto."  In  contrast  to  this,  in  the  superiorization  methodology  a  feasibility-seeking  al¬ 
gorithm  leads  the  overall  process,  and  objective  function  steps  are  interlaced  into  it. 
This  makes  a  difference  because  the  feasibility-seeking  algorithm  employs  projec¬ 
tions  onto  the  individual  constraints  sets  and  not  onto  the  entire  feasible  region. 

We  present  the  two  approaches  side-by-side  and  demonstrate  their  performance  on 
a  problem  of  computerized  tomography  image  reconstruction,  posed  as  a  constrained 
minimization  problem  aiming  at  finding  a  constraint-compatible  solution  that  has  a 
reduced  value  of  the  total  variation  of  the  reconstructed  image. 
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1  Introduction 

Our  aim  in  this  paper  is  to  expose  the  recently  developed  superiorization  methodol¬ 
ogy  and  its  ideas  to  the  optimization  community  by  "confronting"  it  with  the  pro¬ 
jected  subgradient  method.  We  juxtapose  the  projected  subgradient  method  (PSM  ) 
with  the  superiorization  methodology  (SM  )  and  demonstrate  their  performance  on  a 
large-size  real-world  application  that  is  modeled,  and  needs  to  be  solved,  as  a  con¬ 
strained  minimization  problem.  The  PSM  for  constrained  minimization  has  been  ex¬ 
tensively  investigated,  see,  e.g.,  [1,  Sect.  7.1.2],  [2,  Sect.  3.2.3],  Its  roots  are  in  the 
work  of  Shor  [3]  for  the  unconstrained  case  and  in  the  work  of  Polyak  [4,  5]  for  the 
constrained  case.  M  ore  recent  work  can  be  found  in,  e.g.,  [6].  The  superiorization 
methodology  was  first  proposed  in  [7],  although  without  using  the  term  superior¬ 
ization.  In  that  work,  perturbation  resilience  (without  using  this  term)  was  proved 
for  the  general  class  of  string-averaging  projection  (SAP)  methods,  see  [8-12],  that 
use  orthogonal  projections  and  relate  to  consistent  constraints.  Subsequent  investi¬ 
gations  and  developments  of  the  SM  were  done  in  [13-17],  More  information  on 
superiorization-related  work  is  given  in  Sect.  3. 

It  is  not  claimed  that  the  PSM  is  the  best  optimization  method  for  solving  con¬ 
strained  minimization  problems  and  there  are  many  different  alternative  methods  with 
which  SM  could  be  compared.  So,  why  did  we  chose  to  confront  the  PSM  with  our 
SM  ?  In  a  nutshell,  our  answer  is  that  both  methods  interlace  steps  related  to  the  objec¬ 
tive  function  with  steps  oriented  toward  feasibility,  but  they  differ  in  how  they  restore 
or  preserve  feasibility.  A  major  difficulty  with  the  PSM  is  the  need  to  perform,  within 
each  iterative  step,  an  orthogonal  projection  onto  the  feasible  set  of  the  constrained 
minimization  problem.  If  the  feasible  set  is  not  "simple  to  project  onto,"  then  the  pro¬ 
jection  requires  an  independent  inner-loop  calculation  to  minimize  the  distance  from 
a  point  to  the  feasible  set,  which  can  be  costly  and  hamper  the  overall  effectiveness 
of  the  PSM . 

In  the  SM  ,  we  replace  the  notion  of  a  fixed  feasible  set  by  that  of  a  nonnegative 
real-valued  proximity  function.  This  function  serves  as  an  indicator  of  how  incom¬ 
patible  a  vector  is  with  the  constraints.  In  such  a  formulation,  the  merit  of  an  actual 
output  vector  of  any  algorithm  is  indicated  by  the  smallness  of  the  two  numbers, 
i.e.,  the  values  of  the  proximity  function  and  the  objective  function.  The  underlying 
idea  of  SM  is  that  many  iterative  algorithms  that  produce  outputs  for  which  the  prox¬ 
imity  function  is  small  are  strongly  perturbation  resilient  in  the  sense  that,  even  if 
certain  kinds  of  changes  are  made  at  the  end  of  each  iterative  step,  the  algorithm  still 
produces  an  output  for  which  the  proximity  function  is  not  larger.  This  property  is 
exploited  by  using  permitted  changes  to  steer  the  algorithm  to  an  output  that  has  not 
only  a  small  proximity  function  value,  but  has  also  a  small  objective  function  value. 

The  PSM  requires  that  feasibility  is  regained  after  each  subgradient  step  by  per¬ 
forming  a  projection  onto  the  entire  feasible  set,  whereas  in  the  SM  the  feasibility¬ 
seeking  projection  method  proceeds  by  projecting  (in  a  well-defined  algorithmically 
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structured  regime  dictated  by  the  specific  projection  method)  onto  the  individual  sets, 
whose  intersection  is  the  entire  feasible  set,  and  not  onto  the  whole  feasible  set  itself. 
This  has  a  potentially  great  computational  advantage. 

We  elaborate  on  the  motivation  for  this  work  in  Sect.  2.  In  Sect.  3  we  discuss 
some  superiorization-related  work,  in  Sect.  4  the  SM  is  presented,  and  in  Sect.  5 
we  demonstrate  the  approaches  of  the  SM  and  the  PSM  on  a  real istical  ly-l arge-si ze 
problem  with  data  that  arise  from  the  significant  problem  of  x-ray  computed  tomog¬ 
raphy  (CT)  with  total  variation  (TV)  minimization,  followed  by  some  conclusions  in 
Sect.  6. 


2  Motivation  and  Basic  Notions 

Throughout  this  paper,  we  assume  that  £2  is  a  nonempty  subset  of  the  /-dimensional 
Euclidean  spaceR7.  We  consider  constrained  minimization  problems  of  theform 

minimize{0(x)  |  x  e  C},  (1) 

where  4> :  R7  -*  R  is  an  objective  function,  and  C  c  q  is  a  given  feasible  set. 

Since  we  juxtapose  the  projected  subgradient  method  (PSM  )  with  the  superioriza- 
tion  methodology  (SM  )  and  demonstrate  their  performance  on  a  large- size  real-world 
application  that  is  modeled,  and  needs  to  be  solved,  as  a  constrained  minimization 
problem,  wenow  outline  these  two  methods  and  explain  ourchoicein  detail. 

In  order  to  apply  the  PSM  to  solving  (1),  we  need  to  assume  that  C  is  a  nonempty 
closed  convex  set  and  that  0  is  a  convex  function.  The  PSM  generates  a  sequence  of 
iterates  according  to  the  recursion  formula 

xk+1  =  Pc(xk-tk4>'(x%  (2) 

where  tk  >  0  is  a  step-size,  <p'(xk)  e  d<p(xk )  is  a  subgradient  of  (p  at  xk,  and  Pc 
stands  for  the  orthogonal  (least  Euclidean  norm)  projection  onto  the  set  C. 

A  major  difficulty  with  (2)  is  the  need  to  perform,  within  each  iterative  step,  the 
orthogonal  projection.  If  the  feasible  set  C  is  not  "simple  to  project  onto,"  then  the 
projection  requires  an  independent  inner-loop  calculation  to  minimize  the  distance 
from  the  point  xk  -  tk<p'(xk )  to  the  set  C,  which  can  be  costly  and  hamper  the  overall 
effectiveness  of  an  algorithm  that  uses  (2).  Also,  if  the  inner  loop  converges  to  the 
projection  onto  C  only  in  the  limit,  then,  in  practical  implementations,  it  will  have  to 
be  stopped  after  a  finite  number  of  steps,  and  so  xk+1  will  be  only  an  approximation 
to  the  projection  onto  C,  and  it  could  even  happen  that  it  is  not  in  C. 

Even  if  we  set  aside  our  worries  about  projecting  onto  C  in  (2),  there  are  still  two 
concerns  when  applying  the  PSM  to  real-world  problems.  One  is  that  the  iterative 
process  usually  converges  to  the  desired  solution  only  in  the  limit,  in  practice,  some 
stopping  rule  is  applied  to  terminate  the  process,  and  the  output  at  that  time  may  not 
even  be  in  C,  and,  even  if  it  is  in  C,  it  is  most  unlikely  to  be  the  minimizer  of  </> 
over  C.  The  second  problem  in  real-world  applications  comes  from  the  fact  that  the 
constraints,  derived  from  the  real-world  problem,  may  not  beconsi stent  (e.g.,  because 
they  come  from  noisy  measurements),  and  so  C  is  empty. 
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Similar  criticism  applies  actually  to  many  constrained-minimization-seeking  al¬ 
gorithms  for  which  asymptotic  convergence  results  are  available.  In  the  SM  ,  both  of 
these  objections  can  be  handled  by  replacing  the  notion  of  a  fixed  feasible  set  C  by 
that  of  a  nonnegative  real-valued  proximity  function  Proxc  :  £2  ->•  R+.  This  func¬ 
tion  serves  as  an  indicator  of  how  incompatible  a  vector  x  is  with  the  constraints. 
In  such  a  formulation,  the  merit  of  the  actual  output  x  of  any  algorithm  is  indi¬ 
cated  by  the  smallness  of  the  two  numbers  Proxc(x)  and  </>(x).  For  the  formula¬ 
tion  of  (1),  we  would  define  Proxc  so  that  its  range  is  the  ray  of  nonnegative  real 
numbers  with  Proxc(x)  =  0  if,  and  only  if,  x  e  C,  and  then  the  constrained  mini¬ 
mization  problem  (1)  is  precisely  that  of  finding  an  x  that  is  a  minimizer  of  <j>{x)  over 
{x  |  Proxc(x)  =  0}.  The  above  discussion  allows  us  to  do  away  with  the  nonempti¬ 
ness  assumption  and  also  to  compare  the  merits  of  actual  outputs  of  algorithms  that 
only  approximate  the  aim  of  the  constrained  minimization  problem. 

The  recently  invented  SM  incorporates  the  ideas  of  the  previous  paragraph  in  its 
very  foundation  and  formulates  the  problem  with  the  function  Proxc  instead  of  the 
set  C.  The  underlying  idea  of  SM  is  that  many  iterative  algorithms  that  produce  out¬ 
puts  x  for  which  Proxc(x)  is  small  are  strongly  perturbation  resilient  in  the  sense 
that,  even  if  certain  kinds  of  changes  are  made  at  the  end  of  each  iterative  step,  the  al¬ 
gorithm  still  produces  an  output  x'  for  which  Proxc(x')  is  not  larger.  This  property  is 
exploited  by  using  permitted  changes  to  steer  the  algorithm  to  an  output  that  has  not 
only  a  small  Proxc  value,  but  has  also  a  small  <p  value.  The  algorithm  that  incorpo¬ 
rates  such  a  steering  process  is  referred  to  as  the  superiorized  version  of  the  original 
iterative  algorithm.  The  main  practical  contribution  of  SM  is  the  automatic  creation 
of  the  superiorized  version,  according  to  a  given  objective  function  </>,  of  just  about 
any  iterative  algorithm  that  aims  at  producing  an  x  for  which  Proxc(x)  is  small. 

Nevertheless,  in  order  to  carry  out  our  comparative  study,  we  restrict  our  attention 
here  to  a  subset  of  all  possible  problems  to  which  not  only  the  SM  but  also  the  PSM 
is  applicable.  We  assume  that  we  are  given  a  family  of  constraints  {Q}[=1,  where 
each  set  C£  is  a  nonempty  closed  convex  subset  of  R-7  such  that 


L 


c = n  q 


(3) 


e=i 


isa  nonempty  subset  of  Q  andthatitisthefeasiblesetc  of  (1).  Under  these  assump¬ 
tions,  we  illustrate  the  application  of  the  SM  by  the  superiorization  of  feasibility¬ 
seeking  projection  methods,  see,  e.g.,  [18-22]  and  the  recent  monograph  [23].  Such 
methods  use  projections  onto  the  individual  sets  Q  in  order  to  generate  a  sequence 
{xfe}^i0  that  converges  to  a  point  x*  e  C.  Therefore,  contrary  to  the  PSM  ,  one  does 
not  need  to  assume  that  C  is  a  "simple  to  project  onto"  set,  but  rather  that  the  in¬ 
dividual  sets  Ct  have  this  property.  The  latter  is  indeed  often  the  case,  such  as,  for 
example,  where  the  sets  Ce  are  hyperplanes  or  half-spaces  onto  which  we  can  project 
easily,  but  their  intersection  is  not  "simple  to  project  onto." 

The  SM  is  accurately  presented  in  Sect.  4  below.  However,  the  discussion  above 
is  sufficient  to  explain  why  we  chose  the  PSM  and  the  SM  for  our  comparative 
study.  Namely,  both  methods  interlace  objective-function-reduction  steps  with  steps 
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oriented  toward  feasibility.  But  exactly  here  lies  a  big  difference  between  the  two  ap¬ 
proaches.  The  PSM  requires  that  feasibility  is  regained  after  subgradient  nonascent 
steps  by  performing  a  projection  onto  C,  whereas  in  the  SM  the  feasibility-seeking 
projection  method  proceeds  by  projecting  (in  a  well-defined  algorithmically  struc¬ 
tured  regime  dictated  by  the  specific  projection  method)  onto  the  individual  sets  Q 
and  not  onto  the  whole  feasible  set  C.  This  has  a  potentially  great  computational 
advantage. 


3  Superiorization-R elated  PreviousWork 

The  superiorization  methodology  was  first  proposed  in  [7],  although  without  using 
the  term  superiorization.  In  that  work,  perturbation  resilience  (without  using  this 
term)  was  proved  for  the  general  class  of  string-averaging  projection  (SAP)  methods, 
see  [8-12],  that  use  orthogonal  projections  and  relate  to  consistent  constraints.  Sub¬ 
sequent  investigations  and  developments  were  done  in  [13-17],  In  [13],  the  method¬ 
ology  was  formulated  over  general  problem  structures  that  enabled  rigorous  analysis 
and  revealed  that  the  approach  is  not  limited  to  feasibility  and  optimization.  In  [14], 
perturbation  resilience  was  analyzed  for  the  class  of  block-iterative  projection  (BIP) 
methods,  see  [18-22],  and  applied  in  this  manner.  In  [15],  the  advantages  of  supe¬ 
riorization  for  image  reconstruction  from  a  small  number  of  projections  was  stud¬ 
ied,  and  in  [16]  two  acceleration  schemes  based  on  (symmetric  and  nonsymmetric) 
BIP  methods  were  proposed  and  experimented  with,  in  [17],  total  variation  superi¬ 
orization  schemes  in  proton  computed  tomography  (pCT)  image  reconstruction  were 
investigated. 

In  [24],  we  introduced  the  notion  of  e-compatibility  into  the  superiorization  ap¬ 
proach  in  order  to  handle  inconsistent  constraints.  This  enabled  us  to  close  the  logical 
discrepancy  between  the  assumption  of  consistency  of  constraints  and  the  actual  ex¬ 
perimental  work  done  previously.  We  also  introduced  there  the  new  notion  of  strong 
perturbation  resilience,  which  generalizes  the  previously  used  notion  of  perturba¬ 
tion  resilience.  Algorithmically,  the  new  superiorized  algorithm  introduced  there  (and 
used  here)  is  different  from  all  previous  ones  in  that  it  uses  the  notion  of  nonascending 
direction  and  in  that  it  allows  several  perturbation  steps  for  each  feasibility-seeking 
step,  an  aspect  that  has  practical  advantages. 

In  [25],  superiorization  was  applied  to  the  expectation  maximization  (EM  )  algo¬ 
rithm  instead  of  the  feasibility-seeking  projection  methods  that  were  used  in  superior¬ 
ization  previously.  The  approach  was  implemented  thereto  solve  an  inverse  problem 
of  bioluminescence  tomography  (BLT)  image  reconstruction.  Such  EM  superioriza¬ 
tion  was  investigated  further  and  applied  to  a  problem  of  Single  Photon  Emission 
Computed  Tomography  (SPECT)  in  [26],  Most  recently,  in  [27],  the  SM  was  fur¬ 
ther  investigated  numerically,  along  with  many  projection  methods  for  the  feasibility 
problem  and  for  the  best  approximation  problem. 

Our  superiorization  methodology  should  be  distinguished  from  the  works  of  H  el  ou 
Neto  and  De  Pierro  [28,  29],  of  Nedic  [30],  Ram,  Nedic,  and  Veeravalii  [31],  and 
of  Nurminski  [32-35].  The  lack  of  cross-referencing  between  some  of  these  papers 
shows  that,  in  spite  of  the  similarities  between  their  approaches,  their  results  were 
apparently  reached  independently. 
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There  are  various  differences  among  the  works  mentioned  in  the  previous  para¬ 
graph,  differences  in  overall  setup  of  the  problems,  differences  in  the  assumptions 
used  for  the  various  convergence  results,  etc.  This  is  not  the  place  for  a  full  review 
of  all  these  differences.  But  we  wish  to  clarify  the  fundamental  difference  between 
them  and  the  SM  .  The  point  is  that  when  two  activities  are  interlaced,  here,  feasibil¬ 
ity  steps  and  objective  function  reduction  steps,  then  once  the  process  is  running  all 
such  methods  look  alike.  From  looking  at  the  iterative  formulas,  one  cannot  tel  I  if  (a) 
"feasibility  steps  are  interlaced  into  an  iterative  gradient  scheme  for  objective  func¬ 
tion  minimization"  or  if  (b)  "objective  function  reduction  steps  are  interlaced  into  an 
iterative  projections  scheme  for  feasibility-seeking."  The  common  thread  of  all  works 
mentioned  in  the  previous  paragraph  is  that  they  fall  into  the  category  (a),  while  the 
SM  is  of  the  kind  (b).  In  all  methods  of  category  (a)  the  condition  that  is  needed  to 
guarantee  convergence  to  a  constrained  minimum  point  is  that  the  diminishing  step- 
sizes  ak  0  as  k  — >  oo  must  be  such  that  t  =  +oo.  in  contrast,  since  the 

feasibility-seeking  projection  method  is  the  "leader"  of  the  overall  process  in  the  SM  , 
we  must  have  that  the  perturbations  (that  do  the  objective  function  reduction)  will 
use  diminishing  step-sizes  pk  ->■  0  as  k  oo  but  such  that  J^kLoPk  <  oo.  The  lat¬ 
ter  condition  guarantees  the  perturbation  resilience  of  the  original  feasibility-seeking 
projection  method  so  that,  regardless  of  the  interlaced  objective  function  reduction 
steps,  the  overall  process  converges  to  a  feasible,  or  e-compatible,  point  of  the  con¬ 
straints. 

Yet  another  fundamental  difference  between  thesuperiorization  methodology  and 
the  algorithms  of  category  (a)  mentioned  above  is  that  those  algorithms  perform  the 
interlaced  objectivefunction  descent  and  feasibility  steps  alternati ngly  according  to  a 
rigid  predetermined  scheme,  whereas  in  the  superiorization  methodology  the  activa¬ 
tion  of  these  steps  and  the  decisions  whether  to  keep  an  iterate  or  discard  it  are  done 
inside  the  superiorized  algorithm  in  a  controlled  and  automatically  supervised  man¬ 
ner.  Thus,  the  superiorization  methodology  has  the  following  features  not  present 
in  the  algorithms  of  category  (a)  mentioned  above:  (i)  it  conducts  iterations  of  a 
feasibility-seeking  projection  method  which  is  strongly  perturbation  resilient  (as  de¬ 
fined  below),  (ii)  it  interlaces  objective  function  nonascent  steps  into  the  process  in 
a  controlled  and  automatically  supervised  manner,  (iii)  it  is  not  known  to  guarantee 
convergence  to  a  solution  of  the  constrained  minimization  problem,  and  it  might  (we 
do  not  know  if  this  is  so  or  not)  instead  only  be  shown  to  lead  to  a  feasible  point 
whose  objective  function  value  is  less  than  that  of  a  feasible  point  that  would  have 
been  reached  by  the  same  feasibility-seeking  projection  method  without  the  pertur¬ 
bations  exercised  by  the  superiorized  algorithm. 

The  adaptive  steepest  descent  projections  onto  convex  sets  (ASD-POCS)  algo¬ 
rithm  described  in  [36]  has  some  similarities  to  theSM  .  However,  itis  not  as  general 
as  the  SM  ;  see  [24]  for  a  comparison. 


4  TheSuperiorization  M  ethodology 

I  n  this  section  we  present  a  restricted  version  of  the  SM  of  [24]  adapted  to  our  prob¬ 
lem  (1).  As  discussed  in  Sect.  2,  we  associate  with  the  feasible  set  C  in  (1)  a  proximity 
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function  Proxc  :  £2  -*  M+  that  is  an  indicator  of  how  incompatible  an  x  e  £2  is  with 
the  constraints.  For  any  given  e  >  0,  a  point  x  e  £2  for  which  Proxc(x)  <  e  is  called 
an  e-compatible  solution  for  C.  We  further  assume  that  we  have,  for  the  C  in  (1), 
a  feasibility-seeking  algorithmic  operator  Ac  :RJ  ^  £2,  with  which  we  define  the 
following  basic  algorithm. 

The  Basic  Algorithm 

(B 1)  I  nitialization:  C  hoose  an  arbitrary  x°  e  £2, 

(B2)  Iterative  Step:  Given  the  current  iterate  xk,  calculate  the  next  iterate  xk+1  by 

xk+1  =  Ac(xk).  (4) 

The  foil  owing  definition  helps  to  evaluate  the  output  of  the  Basic  Algorithm  upon 
termination  by  a  stopping  rule. 

Definition  4.1  (The  e-output  of  a  sequence)  Given  C  c  Ry,  a  proximity  function 
Proxc  :  £2  ->  K+,  a  sequence  {x*}£i0  c  £2  and  an  e  >  0,  then  an  element  xK  of 
the  sequence  which  has  the  properties:  (i)  Proxc(x^)  <  e,  and  (ii)  Proxc(xk)  >  e 

for  all  0  <  k  <  K,  is  called  an  e-output  of  the  sequence  with 

respect  to  the  pair  (C,  ProXc).  We  denote  it  by  0(C,  e,  {-^l^Lg)  =  xK . 

Clearly,  an  e-output  0(C,e,  {xA}£i0)  of  a  sequence  might  or  might  not 

exist,  but  if  it  does,  then  it  is  unique.  If  {xk}^0 's  Pr°duced  by  an  algorithm  intended 
for  the  feasible  set  C,  such  as  the  Basic  Algorithm,  without  a  termination  criterion, 
then  0(C,  e,  is  the  output  produced  by  that  algorithm  when  it  includes  the 

termination  rule  to  stop  when  an  e-compatible  solution  for  C  is  reached. 

Definition  4.2  (Strong  perturbation  resilience)  Assume  that  we  are  given  aCcfi, 
a  proximity  function  Proxc,  an  algorithmic  operator  Ac  and  an  x°  e  Q.  We  use 
{x<:}fcLo t0  denote  the  sequence  generated  by  the  Basic  Algorithm  when  it  is  initial¬ 
ized  by  x°.  The  Basic  Algorithm  is  said  to  be  strongly  perturbation  re¬ 
silient  iff  the  following  hold: 

(i)  there  exists  an  e  >  0  such  that  the  e-output  0(C,e,  [xk}fL0)  exists  for  every 

x°  e 

(ii)  for  every  e  >  0,  for  which  the  e-output  0(C,  e,  {x*}£L0)  exists  for  every  x°  e  £2, 
we  have  also  that  the  e'-output  0(C,  s',  {yk}fL0)  exists  for  every  s'  >  e  and  for 
every  sequence  {yk}%L 0  generated  by 

yk+1  =  Ac(yk  +  Pkvk)  for  all  k  >  0,  (5) 

where  the  vector  sequence  {i^l^lg  is  bounded,  and  the  scalars  {Pk}^  are  such 
that  pk  >  0  for  ail  k>  0  and  Y^kLo  Pk  < 

Definition  4.3  (Bounded  convergence)  Assume  that  we  are  given  a  C  c  Ry,  a  prox¬ 
imity  function  Proxc,  and  an  algorithmic  operator  Ac  :  K-7  ->  f2.  Then  the  Basic 
Algorithm  is  said  to  be  convergent  over  Q  iff  for  every  x0  e  £2,  there  exists 
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the  limit  limjfc-KjoJt*  =  y(x°)  and  y(x°)  e  £2.  it  is  said  to  be  boundedly  con¬ 
vergent  over  Q  iff,  in  addition,  there  exists  a  y  >  0  such  that  Proxc(y(x0))  <  y 
for  every  x°  e  £2. 

Next  theorem,  which  gives  sufficient  conditions  for  strong  perturbation  resilience 
of  the  Basic  Algorithm,  has  been  proved  in  [24,  Theorem  1]  (in  different  wording). 

Theorem  4.1  Assume  that  we  are  given  a  C  c  R7,  a  proximity  function  Proxc,  and 
an  algorithmic  operator  Ac  :  R7  -*  S2.  If  Ac  is  nonexpansive  and  is  such  that  it 
defines  a  boundedly  convergent  Basic  Algorithm  and  if  the  proximity  function  Proxc 
is  uniformly  continuous,  then  the  Basic  Algorithm  defined  by  Ac  is  strongly  pertur¬ 
bation  resilient. 

Along  with  the  C  c  R7,  we  look  at  the  objective  function  <f> :  R7  ->  R,  with 
the  convention  that  a  point  in  R7  for  which  the  value  of  <p  is  smaller  is  considered 
superior  to  a  point  in  R7  for  which  the  value  of  </>  is  larger.  The  essential  idea  of  the 
SM  is  to  make  use  of  the  perturbations  of  (5)  to  transform  a  strongly  perturbation 
resilient  algorithm  that  seeks  a  constraints-compatible  solution  for  C  into  one  whose 
outputs  are  equally  good  from  the  point  of  view  of  constrai nts-compati bi lity,  but  are 
superior  (not  necessarily  optimal)  according  to  the  objective  function  0. 

This  is  done  by  producing  from  the  Basic  Algorithm  another  algorithm,  called  its 
superiorized  version,  that  makes  sure  not  only  that  the  pkvk  are  bounded  perturba¬ 
tions,  but  also  that<p(yk  +  Pkvk)  <  <p{yk)  for  all  k.  To  do  so,  we  use  the  next  concept, 
closely  related  to  the  concept  of  "descent  direction." 

Definition  4.4  Given  a  function  0 :  R7  -»■  R  and  a  point  y  e  R7,  we  say  that  a  vector 
d  e  R7  is  nonascending  for  0  at  y  iff  ||</||  <  1  and  there  is  a  S  >  0  such  that 

for  all  A.  e  [0, 5],  we  have  00  +  Xd)  <  00).  (6) 

Obviously,  the  zero  vector  is  always  such  a  vector,  but  for  superiorization  to  work, 
we  need  a  sharp  inequality  to  occur  in  (6)  frequently  enough. 

The  Superiorized  Version  of  the  Basic  Algorithm  assumes  that  we  have  available 
a  summable  sequence  {m}%L 0  of  positive  real  numbers  (for  example,  ^  =  cil,  where 
0  <  a  <  1)  and  it  generates,  simultaneously  with  the  sequence  {y*}^0  in  Q,  se¬ 
quences  {u*}£L0  and  {/?*}£ L0-  The  latter  is  generated  as  a  subsequence  of  {^}“0, 
resulting  in  a  nonnegative  summable  sequence  The  algorithm  further  de¬ 

pends  on  a  specified  initial  point  y°  e  £2  and  on  a  positive  integer  N.  it  makes  use 
of  a  logical  variable  called  loop.  The  superiorized  algorithm  is  presented  next  by  its 
pseudo-code. 

Superiorized  Version  of  the  Basic  Algorithm 

1.  setf  =  0 

2.  set  v*  =  y° 

3.  set  £  =  - 1 

4.  repeat 
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5. 

6. 

7. 

8. 
9. 

10. 

11. 

12. 

13. 

14. 

15. 

16. 
17. 


setn  =  0 

set  yk'n  =  / 


while;;  <  N 

set  vk’n  to  be  a  nonascending  vector  for  4>  at  yk'n 
set  loop  =  true 
while  loop 


se&e  =  £  +  l 


S®t  Pk,n  —  Vf 


set  z  =  yk’n+pk,nvk'n 
if  <p(z)  <  (f>(yk)  then 


set  n  —  n  +  1 


set  yk'n  =  z 

set  loop  =  false 


18.  set/+1  =  Ac(yk’N) 

19.  setfc  =  £  +  l 

Theorem  4.2  Any  sequence  [yk}™=0,  generated  by  the  Superiorized  Version  of 
the  Basic  Algorithm,  satisfies  (5).  Further,  if,  for  a  given  s  >  0,  the  e-output 
0(C,  e,  {x*}£i0)  of  the  Basic  Algorithm  exists  for  every  x°  e  £2,  then  every  sequence 
generated  by  the  Superiorized  Version  of  the  Basic  Algorithm,  has  an  e'- 
output  0(C,  s',  {yk }£10)  for  every  e'  >  e. 

This  theorem  follows  from  the  analysis  of  the  behavior  of  the  Superiorized  Version 
of  the  Basic  Algorithm  in  [24],  In  other  words,  the  Superiorized  Version  produces 
outputs  that  are  essentially  as  constraints-compatible  as  those  produced  by  the  origi¬ 
nal  not  superiorized  algorithm.  However,  due  to  the  repeated  steering  of  the  process 
by  lines  7  to  17  toward  reducing  the  value  of  the  objective  function  0,  we  can  expect 
that  the  output  of  the  Superiorized  Version  will  be  superior  (from  the  point  of  view 
of  (p)  to  the  output  of  the  original  algorithm. 


5  A  Computational  Demonstration 

5.1  Thex-RayCT  Problem 

The  fully  discretized  model  in  the  series  expansion  approach  to  the  image  reconstruc¬ 
tion  problem  of  x-ray  computerized  tomography  (CT)  is  formulated  in  the  following 
manner.  A  Cartesian  grid  of  square  picture  elements,  called  pixels,  is  introduced  into 
the  region  of  interest  so  that  it  covers  the  whole  picture  that  has  to  be  reconstructed. 
The  pixels  are  numbered  in  some  agreed  manner,  say  from  1  (top  left  corner  pixel) 
to  /  (bottom  right  corner  pixel). 

The  x-ray  attenuation  function  is  assumed  to  take  a  constant  value  xj  throughout 
the  ./th  pixel  for  j  =  1,2, J.  Sources  and  detectors  are  assumed  to  be  points,  and 
the  rays  between  them  are  assumed  to  be  lines.  Further,  assume  that  the  length  of 
intersection  of  the  /'th  ray  with  the  j  th  pixel,  denoted  by  at,  for  i  =  1,2, I,  j  = 

1.2  _ ,J,  represents  the  weight  of  the  contribution  of  the  j  th  pixel  to  the  total 

attenuation  along  the  /th  ray. 
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The  physical  measurement  of  the  total  attenuation  along  the/th  ray,  denoted  by  bj, 
represents  the  line  integral  of  the  unknown  attenuation  function  along  the  path  of  the 
ray.  Therefore,  in  this  fully  discretized  model,  the  line  integral  turns  out  to  be  a  finite 
sum,  and  the  model  is  described  by  a  system  of  linear  equations 

j 

'y^Xja'j  =  bj ,  i  =  l,2 . I.  (7) 

7=1 

In  matrix  notation  we  rewrite  (7)  as 


Ax  =  b,  (8) 

where  be  l7  is  the  measurement  vector,  x  e  R7  is  the  image  vector,  and  the  I  x  J 
matrix  A  =  (at)  is  the  projection  matrix.  See  [37],  especially  Sect.  6.3,  foracomplete 
treatment  of  this  subject. 

5.2  The  Algorithms  that  We  Use 

In  this  section  we  describe  the  PSM  and  SM  algorithms  specifically  used  in  our 
demonstration.  We  applied  both  algorithms  to  solve  the  fully  discretized  model  in 
the  series  expansion  approach  to  the  image  reconstruction  problem  of  x-ray  CT,  for¬ 
mulated  in  the  previous  section  and  represented  by  the  optimization  problem 

minimize{0(x)  |  Ax  =  b  and  0  <  x  <  1}.  (9) 

The  box  constraints  are  natural  for  this  problem:  If  xj  represents  the  linear  attenu¬ 
ation  coefficient,  measured  in  cm-1,  ata  medically  used  x-ray  energy  spectrum  in  the 
j th  pixel,  then  the  box  constraints  0  <  x  <  1  are  reasonable  for  tissues  in  the  human 
body;  see  Table  4.1  of  [37],  Hence,  for  the  image  reconstruction  problem  of  x-ray 
CT,  we  define  12  by 

12  =  {x  e  R7  |  0  <  x  <  1}.  (10) 

We  note  that  this  12  is  bounded. 

The  choice  of  C  in  (1)  is  of  the  type  specified  in  (3),  with  L  =  1  +  1,  C,  =  [x  e 
R7  |  {a‘ ,x)  —  bj}  for  /  =  1, 2, . . . ,  I  and  CI+ 1  =  12.  Furthermore,  since  in  the  exper¬ 
iment  reported  below,  we  start  with  a  specific  image  vector  x  e  12  and  calculate  from 
it  the  measurement  vector  be  R7  using  (7),  we  know  that  C  is  a  nonempty  subset 
of  12,  which  is  the  requirement  stated  below  (3). 

For  any  such  C,  we  define  Proxc  :  12  ->•  R+  by 


Proxc(x)=  ^2(bj  —  (a\  x))2.  (11) 

\l  i-=i 

Note  that  this  proximity  function  Proxc  is  uniformly  continuous  and  thus  satisfies 
the  condition  stated  for  it  in  Theorem  4.1. 
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Our  choice  for  the  objective  function  0  is  the  total  variation  (TV)  of  the  image 
vector  x.  Denoting  the  G  x  H  image  array  X  ( GH  =  J)  obtained  from  the  image 
vector  .v  by  Xgih  =  x(g-i)H+h  for  1  <  g  <  G  and  1  <  h  <  H,  we  use 

G-1H-1  _ 

0W=  TV(X)=£  j:J(Xg+1,h-Xg,h)2  +  (Xg,h+1-Xg,h)2.  (12) 

g= 1  h=  1 

5.2.1  The  Projected  Subgradient  M  ethod 


We  implemented  thePSM  with  the  choice  of  C  and  the  objective  function  cp  described 
above.  We  used  the  PSM  recursion  formula  (2)  and  adopted  a  nonsummable  dimin¬ 
ishing  step-length  rule  of  the  form  tk  =  yk/ \\(p'(xk)\\,  where  yk  >  0,  lim^oo  yk  =  0, 
and  EZo  Yk  =  oo. 

ThePSM  Algorithm 

(PI)  Initialization:  Select  a  point  x°  e  RJ ,  select  integers  K  and  M,  use  two  real 
number  variables  curr  and  prev,  and  set  curr  =  0(x°)  and  prev  =  curr. 
(P2)  Iterative  step:  Given  the  current  iterate  jc*  ,  calculate  the  next  one  as  follows: 
(P2.1)  Calculate  a  subgradient  of  </>  at  xk,  i.e.;  <p’(xk)  e  dcp{xk),  a  step-size 
tk  =  k~1/A/ \\(p'(xk)\\2,  and  the  vector 

qk  =  xk  -tk<t>'(xk).  (13) 

(P2.2)  Calculate  the  next  iterate  as  the  projection  of  qk  onto  C  by  solving 


xk+1  =  arg  min 


k  II 2 


Ax  =  b  and  0  <  x  <  1 


(14) 


(P2.3)  If  <p(xk+l)  <  curr,  then  curr  =  <p(xk+l). 

(P3)  Stopping  rule:  If  £modW  =  0  (i.e.,  k  is  divisible  by  K),  then:  If  prev  - 
curr  <  prev/M  then  stop.  Otherwise,  prev  =  curr  and  go  to  (P2). 

That  the  PSM  algorithm  converges  to  a  solution  of  (1)  followsfrom  [2,  Sect.  3.2.3], 
in  particular,  from  Theorem  3.2.2  therein,  provided  that  0  is  convex  and  locally  Lip- 
schitz  continuous  and  C  is  closed  and  convex.  The  latter  is  indeed  the  case  for  the  C 
in  (9).  The  convexity  of  the  </>  of  (12)  follows  from  the  end  of  the  proof  of  Proposi¬ 
tion  1  in  [38],  its  Lipschitz  continuity  on  the  whole  space  RJ  follows  from  the  fact 
that  the  TV  function  can  be  rewritten  as 


G-lH-l 


1\J(X)=J2  E 

S=1  /i=l 


(15) 


where  Agj,  is  a  square  matrix  having  only  two  nonzero  rows,  with  the  first  nonzero 
row  containing  only  two  nonzero  elements  1  and  -1  that  correspond  to  the  vari¬ 
ables  Xg+u,  and  Xgj,  respectively,  and  the  second  nonzero  row  containing  only 
two  nonzero  elements  1  and  -1  that  correspond  to  the  variables  Xgjl+ 1  Xgj7,  re¬ 
spectively. 
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In  our  implementation  we  solved  problem  (14),  in  step  (P2.2)  above,  by  consider¬ 
ing  its  dual 

maximize{/(A)  |  A  e  K7},  (16) 

where 

fW  =  \bk-  ATX  -  PcI+1{qk  -  ATX)  I2  -  ±\\qk  -  ArA||2 

-(X,b)  +  ^\\qk\\2.  (17) 

The  optimal  point  x*k  of  (14)  is  then 

x*k  =  PCl+1(qk-ATX*k),  (18) 

where  X*k  is  the  optimal  solution  of  (16).  To  find  X*k,  we  minimized  -/(A)  using 
the  Optimal  Method  of  Nesterov  [39],  as  generalized  by  G ul er  [40,  p.  188],  whose 
generic  description  for  unconstrained  minimization  of  a  convex  function  0(X),  which 
is  continuously  differentiable  with  Lipschitz  continuous  gradient,  isas  follows. 

(Nl)  Initialization:  Select  a  yt°  eRJ  and  a  positive  a_i  and  put!-1  =  /u°,  Po  —  1, 
and  k  =  0. 

(N2)  Iterative  Step:  Given  A*-1,  \ik,  ak~ 1,  and  pk\ 

(N  2.1)  Calculate  the  smallest  index  .v  >  0  for  which  the  following  inequality 
holds: 

d(nk)-e(dk-  2-i^_iV0(^))>2^-1«fc_i||V0(^)||2.  (19) 
(N2.2)  Calculate  the  next  iterate  by 


oik  =  2  sak-i 

and  Xk  = 

=  Vk 

-akve(nk). 

(20) 

and  update 

Pk+ 1  = 

UpI 

+1) 

(21) 

and 

II 

i — i 
+ 

-V 

=L 

&  - 1 

Pk+ 1 

V- 

-  A*-1). 

(22) 

When  a  stopping  rule  applies,  then  the  point  A.*  is  the  output  of  the  method. 

In  the  reported  experiments,  we  used  the  starting  points x°  in  the  PSM  Algorithm 
and  A-1  =  y <°  in  (N 1)  above  to  be  zero  vectors.  In  the  initialization  step  of  the  PSM 
Algorithm,  we  selected  K  —  10  and  M  =  5000.  In  (N 1),  we  chose  a_i  =  10. 

5.2.2  The  Superiorization  Method 

Our  selected  choice  for  the  operator  Ac  in  the  Basic  Algorithm  as  well  as  in  the 
Superiorized  Version  of  the  Basic  Algorithm,  as  described  in  Sect.  4,  is  based  on 
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an  algebraic  reconstruction  technique  (ART),  see  [37,  Chap.  11],  Specifically,  for 
i'  =  l,2 _ _  I,  we  define  the  operators  [/,■ :  RJ  ->•  K7  by 


Uj  (x)  =  x  + 


bi  -  {a1 ,  x)  i 


,i  II 2 


-a  . 


(23) 


Defining  the  projection  operator  onto  the  unit  box  12  by  Q  :  1 

xj  if  0  <  xj  <  1, 


12 


(ew)r 


0  if  Xj  <  o, 
if  i  <. 


(24) 


for  j  =  1,2 _ _  J,  we  specify  the  algorithmic  operator  Ac  :  12  12  by 


Ac(x)  =  QUi  ■  ■  ■  U2Ui(x).  (25) 

Since  the  individual  t/,-s  as  well  as  the  Q  are  clearly  nonexpansive  operators,  the 
same  is  true  for  Ac. 

By  well-known  properties  of  ART  (see,  for  example,  Sects.  11.2  and  15.8  of  [37]), 
the  Basic  Algorithm  with  this  algorithmic  operator  is  convergent  over  12,  and,  in 
fact,  for  every  x°  e  12,  the  limit  y(x°)  is  in  C.  It  follows  that,  for  every  x°  e  12, 
Proxc(y(x0))  =  0,  and  so  the  Basic  Algorithm  is  boundedly  convergent.  Accord¬ 
ing  to  Theorem  4.1,  this,  combined  with  the  facts  that  Ac  is  nonexpansive  and  the 
proximity  function  Proxc  is  uniformly  continuous,  implies  that  the  Basic  Algorithm 
defined  by  Ac  is  strongly  perturbation  resilient. 

The  foil  owing  uses  the  convergence  of  the  Basic  Algorithm  to  an  element  of  C  and 
Theorem  2.  Since  for  al  I  e  >  0,  the  e-output  0(C,  s,  {x*}£i0)  of  the  Basic  Algorithm 
is  defined  for  every  x°  e  12,  we  also  have  that  every  sequence  {yk}^L0  generated  by 
the  Superiorized  Version  of  the  Basic  Algorithm  has  an  e'-output  0(C,e',  {y*}^0) 
for  every  s'  >  0.  This  means  that  for  the  specific  type  of  C  that  is  used  in  our  compar¬ 
ative  study,  the  Superiorized  Version  of  the  Basic  Algorithm  is  guaranteed  to  produce 
an  s'-compatible  outputfor  any  s'  >  0  and  any  initial  point  e  12. 

The  specific  choices  made  when  running  the  Superiorized  Version  of  the  Basic 
Algorithm  for  our  comparative  study  were  the  following.  We  selected  =  0.9991, 
~y°  to  be  the  zero  vector,  and  TV  =  9.  All  these  choices  we  made  are  based  on  auxil¬ 
iary  experiments  (not  included  in  this  paper)  that  helped  determine  optimal  parame¬ 
ters  for  the  data-set  discussed  in  Sect.  5.3.  In  addition,  we  need  to  specify  how  the 
nonascending  vector  vk-n  is  selected  in  line  8  of  the  Superiorized  Version  of  the  Basic 
Algorithm.  We  use  the  method  specified  in  [24]  (especially  Sect.  II.D,  the  paragraph 
following  Eq.  (12)  and  Theorem  2  in  the  Appendix).  Specifically,  we  define  another 
vector  w  and  set  vk-n  to  be  the  zero  vector  if  ||w||  =  0  and  otherwise.  The  com¬ 
ponents  of  w  are  computed  by  wj  =  ^(yk,n)  if  the  partial  derivative  can  be  calcu¬ 
lated  without  a  numerical  difficulty  and  wj  =  0  otherwise,  for  1  <  j  <  J.  Looking 
at  (12),  we  see  that  formally  the  partial  derivative  wj  =  |^( yk’n )  is  the  sum  of  at 
most  three  fractions;  the  phrase  "numerical  difficulty"  in  the  previous  sentence  refers 
to  the  situation  where  in  one  of  these  fractions  the  denominator  has  an  absolute  value 
less  than  10-20. 
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5.3  The  Computational  Result 

The  computational  work  reported  here  was  done  on  a  single  machine  using  a  single 
CPU,  an  Intel  i5-3570K  3.4  GHz  with  16  GB  RAM  using  the  SNARK09  software 
package  [41,  42];  the  phantom,  the  data,  the  reconstructions  and  displays  were  all 
generated  within  this  same  framework.  In  particular,  this  implies  that  differences  in 
the  reported  reconstruction  times  are  not  due  to  the  different  algorithms  being  imple¬ 
mented  in  different  environments. 

Figure  1  shows  the  phantom  used  in  our  study,  which  is  a  485  x  485  digitized 
image  whose  TV  is  984.  The  phantom  corresponds  to  a  cross-section  of  a  human 
head  (based  on  [37,  Fig.  4.6]).  It  is  represented  by  a  vector  with  235,225  compo¬ 
nents,  each  standing  for  the  average  x-ray  attenuation  coefficient  within  a  pixel.  Each 
pixel  is  of  size  0.376  x  0.376  mm2.  The  values  of  the  components  are  in  the  range 
of  [0,0.6241749],  however,  the  display  range  used  here  was  much  smaller,  namely 
[0.204, 0.21675].  The  mapping  between  the  two  ranges  is  such  that  any  value  below 
0.204  is  shown  as  black  and  any  value  above  0.21675  is  shown  as  white  with  a  linear 
mapping  in-between.  We  used  this  display  window  for  all  images  presented  here. 

Data  were  collected  by  calculating  line  integrals  through  the  digitized  head  phan¬ 
tom  in  Fig.  1  using  60  sets  of  equally  rotated  (in  3  degrees  increments)  parallel  lines, 
with  lines  in  each  set  spaced  at  0.752  mm  from  each  other.  Each  line  integral  gives 
rise  to  a  linear  equation  and  represents  a  hyperplane  in  JR-7.  The  phantom  itself  lies  in 
the  intersection  of  all  the  hyperplanes  that  are  associated  with  these  lines,  and  it  also 
satisfies  the  box  constraints  in  (10),  The  total  number  of  linear  equations  is  18,524, 


Fig.  l  The  head  phantom.  The  value  of  its  TV  is  984.  Its  tomographic  data  was  obtained  for  60  views 
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(b) 


Fig.  2  Reconstructions  of  the  head  phantom  of  Fig.  1.  (a)  The  image  reconstructed  by  the  PSM  has 
TV  =  919  and  was  obtained  after  2217  seconds,  (b)  The  image  reconstructed  by  the  SM  has  TV  =  873 
and  was  obtained  after  102  seconds 
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Table  1  Performance 
comparison  of  the  PSM  and  the 

TV  value 

Time  (seconds) 

SM  when  producing  the 
reconstructions  in  Fig.  2 

PSM 

919 

2217 

SM 

873 

102 

making  our  problem  underdetermined  with  235,225  unknowns  (the  intersection  of  all 
the  hyperplanes  is  in  an  at  least  216,701-dimensional  subspace  of  R 235-225).  in  the 
comparative  study,  wefirst  applied  the  PSM  and  then  the  SM  to  these  data  as  follows. 

The  PSM  was  implemented  as  described  in  Sect.  5.2.1.  In  particular,  it  started 
with  the  zero  vector,  for  which  Proxc(x°)  =  326.  It  was  stopped  according  to  the 
Stopping  Rule  (P3),  the  iteration  number  at  that  time  was  815,  and  the  value  of  the 
proximity  function  was  ProxcO815)  =  0.0422,  which  is  very  much  smaller  than  the 
value  at  the  initial  point.  The  computer  time  required  was  2217  seconds.  The  TV  of 
the  output  was  919,  which  is  less  than  that  of  the  phantom,  indicating  that  the  PSM  is 
performing  its  task  of  producing  a  constraints-compatible  output  with  a  low  TV.  This 
output  is  shown  in  Fig.  2(a). 

We  used  the  Superiorized  Version  of  the  Basic  Algorithm,  as  described  in 
Sect.  5.2.2  to  generate  a  sequence  {y*}*2=0  until  it  reached  0(C,  0.0422,  {yA}^0) 
and  considered  that  to  be  the  output  of  the  SM  .  We  know  that  this  output  must  exist 
for  our  problem  and  that  its  constraints-compatibility  will  not  be  greater  than  that  of 
the  output  of  the  PSM  .  The  computer  time  required  to  obtain  this  output  was  102  sec¬ 
onds,  which  is  over  twenty  times  shorter  than  what  was  needed  by  the  PSM  to  get  its 
output.  The  TV  of  the  SM  output  was  876,  which  is  also  less  than  that  of  the  output 
of  PSM  .  The  SM  output  is  shown  in  Fig.  2(b). 

As  summarized  in  Table  1,  with  the  stopping  rule  that  guarantees  that  the  output  of 
theSM  is  at  least  as  constraints-compatible  as  the  output  of  the  PSM  ,  the  SM  showed 
superior  efficacy  compared  to  the  PSM  :  it  obtained  a  result  with  a  lowerTV  value  at 
less  than  one  twentieth  of  the  computational  cost. 


6  Conclusions 

Thesuperiorization  methodology  (SM  )  allows  the  conversion  of  a  feasibility-seeking 
algorithm,  designed  to  find  an  e-compatible  solution  of  the  constraints,  into  a  superi¬ 
orized  algorithm  that  inserts,  into  the  feasibility-seeking  algorithm,  objectivefunction 
reducti  on  steps  while  preservi  ng  the  guaranteed  feasi  bi  I  ity-seeki  ng  nature  of  the  algo¬ 
rithm.  The  superiorized  algorithm  interlaces  objective  function  nonascent  steps  into 
the  original  process  in  an  automatic  manner.  In  case  of  strong  perturbation  resilience 
of  the  original  feasibility-seeking  algorithm,  mathematical  results  indicate  why  the 
superiorized  algorithm  will  be  efficacious  for  producing  an  e-compatible  solution 
output  with  a  low  value  of  the  objectivefunction. 

We  have  presented  an  example  for  which  the  SM  finds  a  better  solution  to  a  con¬ 
strained  minimization  problem  than  the  projected  subgradient  method  (PSM ),  and 
in  significantly  less  computation  time.  This  finding  is  understandable  in  view  of  the 
nature  of  how  the  methods  interlace  feasibility-oriented  activities  with  optimization 
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activities.  While  the  PSM  requires  a  projection  onto  the  feasible  region  of  the  con¬ 
strained  minimization  problem,  the  SM  needs  to  do  only  projections  onto  the  indi¬ 
vidual  constraints  whose  intersection  is  the  feasible  region.  We  demonstrated  this 
experimentally  on  a  large-sized  application  that  is  modeled,  and  needs  to  be  solved, 
as  a  constrained  minimization  problem. 
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Feasibility-Seeking  and  Superiorization  Algorithms  Applied 
to  Inverse  Treatment  Planning  in  Radiation  Therapy 
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and  Lei  Xing 


Abstract.  We  apply  the  recently  proposed  superiorization  methodology  (SM) 
to  the  inverse  planning  problem  in  radiation  therapy.  The  inverse  planning 
problem  is  represented  here  as  a  constrained  minimization  problem  of  the  to¬ 
tal  variation  (TV)  of  the  intensity  vector  over  a  large  system  of  linear  two-sided 
inequalities.  The  SM  can  be  viewed  conceptually  as  lying  between  feasibility¬ 
seeking  for  the  constraints  and  full-fledged  constrained  minimization  of  the 
objective  function  subject  to  these  constraints.  It  is  based  on  the  discovery 
that  many  feasibility-seeking  algorithms  (of  the  projection  methods  variety) 
are  perturbation-resilient,  and  can  be  proactively  steered  toward  a  feasible  so¬ 
lution  of  the  constraints  with  a  reduced,  thus  superiorized,  but  not  necessarily 
minimal,  objective  function  value. 


December  3,  2013 
1.  Introduction 

Computationally  demanding  numerical  minimization  techniques  are  often  used 
in  optimizing  the  treatment  plan  of  different  types  of  intensity  modulated  radia¬ 
tion  therapy  (IMRT),  for  example,  in  volumetric-modulated  arc  therapy  (VMAT). 
However,  some  commonly  employed  objective  functions  and  corresponding  min¬ 
imization  techniques  are  not  necessarily  the  most  appropriate  for  achieving  the 
desired  radiation  dose  distribution  behavior  in  the  patient.  This  disconnect  occurs 
because  minimal  solutions  to  current  minimization  formulations  are  not  guaran¬ 
teed  to  provide  the  desired  dose  coverage,  conformality,  or  homogeneity.  Therefore, 
the  considerable  computational  cost  associated  with  some  of  these  minimization 
techniques  may  not  be  justified. 

We  propose  to  apply  the  recently  developed  novel  superiorization  method  (SM) 
that  improves  computational  tractability  by  aiming  at  a  solution  that  is  guaran¬ 
teed  to  satisfy  the  IMRT  planning  constraints  and  results  in  a  reduced,  but  not 
necessarily  minimal,  value  of  the  objective  function. 
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The  SM  can  be  viewed  conceptually  as  lying  between  feasibility-seeking  for 
the  constraints  and  full-fledged  constrained  minimization  of  the  objective  function 
subject  to  these  constraints.  It  is  based  on  the  discovery  that  many  feasibility¬ 
seeking  algorithms  (of  the  projection  methods  variety)  are  perturbation-resilient, 
and  can  be  proactively  steered  toward  a  feasible  solution  of  the  constraints  with  a 
reduced,  but  not  necessarily  minimal,  objective  function  value. 

The  SM  is,  thus,  capable  of  producing  “superior  feasible  solutions”  by  em¬ 
ploying  less-demanding  feasibility-seeking  projection  methods.  Therefore,  it  may 
replace  current  computationally  demanding  constrained  minimization  methods,  and 
potentially  lead  to  shorter  computational  times  and  improved  dose  distributions. 

The  paper  is  laid  out  as  follows.  In  Section  2  we  briefly  acquaint  the  reader  with 
the  inverse  problem  of  radiation  therapy  treatment  planning  and  the  mathematical 
model  that  we  use.  In  Section  3  a  short  review  of  the  SM  is  given,  and,  in  Section 
4,  we  present  an  illustrative  example  how  SM  can  be  applied  to  planning  a  prostate 
cancer  IMRT  case.  Finally,  in  Section  5  we  provide  our  conclusions. 


2.  The  inverse  problem  of  radiation  therapy  treatment  planning 

Inverse  planning  is  at  the  heart  of  intensity  modulated  treatment  procedures 
and  critically  determines  the  quality  of  the  resulting  treatment  plan.  Usually,  the 
radiation  oncologist  in  charge  defines  the  boundaries  of  the  clinical  and  gross  tumor 
volumes  and  organs  at  risk  (OAR)  for  radiation  late  effects  and  prescribes  the 
minimum  and  maximum  target  doses,  threshold  doses  and/or  volumes  not  to  be 
exceeded  in  OAR  and  gives  importance  factors  for  each.  These  constraints  give 
rise  to  a  mathematical  model  that  requires  the  solution  of  an  inverse  problem.  A 
solution  method  is  run  to  find  a  treatment  plan  consisting  of  intensities  and  timing 
of  different  beam  segments  which  best  matches  all  the  input  criteria. 

However,  as  practiced  now,  the  therapeutic  capacity  of  these  applications  is 
underutilized  because  of  the  computing  performance  of  some  of  the  currently  used 
minimization  methods.  In  this  work,  we  suggest  to  use  the  SM  to  reach  an  accept¬ 
able  treatment  plan.  Let  us  first  briefly  describe  the  inverse  problem  at  hand;  for 
more  technical  details  related  to  different  types  of  IMRT,  the  reader  may  consult 
review  articles,  such  as,  [A,  B,  C],  to  name  but  a  few. 

IMRT-type  techniques  are  currently  the  most  advanced  form  of  external  ra¬ 
diation  therapy.  Different  from  its  predecessor,  3D  conformal  radiation  therapy 
(3DCRT),  the  physician  needs  to  clearly  define  the  objective  of  the  treatment  plan 
by  specifying  dose  and/or  volume  constraints  for  the  planning  target  volume  (PTV) 
and  OAR  that  aims  at  maximum  tumor  cell  killing  and  minimum  harm  to  the  pa¬ 
tient’s  normal  tissues.  The  treatment  plan  resulting  from  solving  a  corresponding 
mathematical  problem  defines  multiple  field  directions  and  the  movement  of  com¬ 
puter  controlled  pairs  of  multileaf  collimator  (MLC)  leaves  for  each  direction. 

The  pairs  of  MLC  leaf  positions  dynamically  change  during  treatment  and 
are  physically  controlled  plates  that  move  during  treatment  and  help  modulate 
the  beam  to  achieve  the  objectives  of  the  physician-defined  treatment  plan.  The 
beam,  therefore,  can  be  conceptually  subdivided  to  a  two-dimensional  grid  of  beam 
subunits  called  beamlets.  Finding  a  deliverable  treatment  plan  comprised  of  beam 
apertures  and  weights  for  the  multiple  directions  and  possible  locations  of  the  MLCs 
is  the  goal  of  the  inverse  treatment  planning  problem.  In  the  next  paragraph  we 


SUPERIORIZATION  ALGORITHMS  APPLIED  TO  RADIATION  TREATMENT  PLANNING  3 

discuss  a  typical  model  for  the  inverse  treatment  planning  problem  that  leads  to  a 
constrained  minimization  problem,  which  in  turn,  fits  the  SM  framework. 

Denote  the  physician’s  prescribed  dose  distribution  to  the  patient  by  a  dose  vec¬ 
tor  d  =  £  RJ  where  dj  is  the  dose  in  voxel  j  of  the  fully-discretized  patient’s 

cross-section.  The  dose  distribution  d  is  known  to  have  a  linear  relationship  with 
the  intensities  of  the  beamlets,  denoted  by  an  intensity  vector  x  =  (xi)j=1  £  R1, 
such  that  Xi  is  the  intensity  of  the  beamlet  i.  The  inversion  problem  can,  therefore, 
be  formulated  as  a  linear  system  of  equations 

(2.1)  d  =  Ax, 

where  A  is  the  J  x  I  dose  matrix  that  maps  any  intensity  of  beamlets  vector  x  onto 
a  dose  in  voxels  vector  d.  Here  I  is  the  total  number  of  beamlets  and  J  is  the  total 
number  of  voxels. 

Further  assume  that  there  are  S  structures  in  the  patient’s  cross-section,  for 
s  =  1, 2, . . . ,  S,  and  let  Os  be  the  set  of  voxel  indices  that  belong  to  a  structure  s 

(2-2)  Os  =  {js,lt  js,2,  ■  •  ■  js,m(s)}i 

where  m(s)  is  the  number  of  voxels  in  the  s  structure.  Then  the  system  matrix  A 
can  be  partitioned  into  blocks 


so  that  a  submatrix  As  will  contain  the  rows  of  A  whose  indices  appear  in  Os, 
and  d(s)  will  be  the  subvector  of  d  whose  component  indices  appear  in  Os ,  and  the 
system  (2.1)  becomes 


Following  a  well-trodden  path  in  this  area,  with  roots  in  [E]  and  [F] ,  we  replace 
the  system  (2.1)  by  a  more  flexible  model  in  which  we  ask  the  physician  to  specify 
lower-  and  upper-dose  bounds  vectors,  d  and  d,  respectively,  on  all  voxels  in  the 
respective  structures.  For  a  structure  s  that  is  an  OAR  we  define 

(2.5)  ^(s)  =  d(5), 

and  for  any  target  structures  s  such  as  the  PTV  we  define 

(2.6)  ^(s)  —  ^(s)‘ 

Hence,  for  an  s  that  is  an  OAR  we  obtain 

(2.7)  0  <  Asx  <  d(s), 
and  for  a  target  structure  s 

(2.8)  ^(s)  —  Asx  £ 
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where  e(s)  is  an  additional  clinically-specified  upper-bound  subvector  on  the  target. 
Denoting  by  a4  the  ith  row  of  the  matrix  A,  the  inequalities  of  (2.7)  are,  component¬ 
wise, 

(2.9)  0  <  x^j  <  d(s),  for  all  i  =  1, 2, . . . ,  m(s), 

where  £  Os,  for  a  structure  s,  and  the  inequalities  of  (2.8)  are, 

(2.10)  d(s)  <  (a?8^,  x^  <  e(s),  for  all  1=  1,2, ...  ,m(s), 

where  (•,•)  stands  for  the  inner  product. 

This  leads  to  a  system  of  linear  inequalities 


(  d(i)  N 

f  ^  \ 

(  d(i)  \ 

—(2) 

a2 

d(2) 

< 

X  < 

V  i(S)  ) 

\As  ) 

\  d(s)  / 

which  serves  as  the  constraints  set  for  the  inverse  problem  modeled  as  a  minimiza¬ 
tion  problem.  For  the  objective  function  </>  we  use  the  total  variation  (TV)  of  the 
intensity  vector  x,  given  by 

u-iv-i  _ 

(2.12)  <j>(X)  =  TV(X)  =  ^2  ^2  V  -  xu,v)2  +  (xu,v+i  -  xUtV)2, 

U—l  u=  1 

where  the  two-dimensional  array  is  obtained  from  the  intensity  vector  x  by 
X  =  {xu,v}u’=  X  v=i  where  u  and  v  are  integers  (and  uv  =  J).  The  use  of  TV 
minimization  in  radiation  therapy  treatment  planning  was  suggested  by  Zhu  et 
al.  in  [G]  but  they  used  there  a  different  modeling  approach  that  led  them  to  a 
minimization  problem,  rather  than  a  feasibility  problem  like  ours  in  (2.11).  They 
handled  the  TV  minimization  by  using  it  to  regularize  their  objective  function  and 
applied  an  exact  constrained  minimization  algorithm,  which  resulted  in  a  large 
computational  burden. 

Our  approach  leads  us  to  the  constrained  minimization  problem  (3.1)  with 
(2.12)  as  the  objective  and  (2.11)  as  the  constraints. 

3.  A  short  review  of  the  SM 

The  superiorization  methodology  (SM)  of  [H,  I]  is  intended  for  nonlinear  con¬ 
strained  minimization  (CM)  problems  of  the  form 

(3.1)  minimize  {4>{x)  \  x  £  C}  , 

where  (f>  :  RJ  — >  R  is  an  objective  function  and  C  C0C  RJ  is  a  given  feasible  set 
defined  by  a  family  of  constraints  {Ct}-=1 .  where  each  set  C)  is  a  nonempty  closed 
convex  subset  of  RJ,  so  that  C  =  n(=1  Ct  ^  0. 

In  a  nutshell,  the  new  paradigm  of  superiorization  lies  between  feasibility¬ 
seeking  and  CM.  It  is  not  quite  trying  to  solve  the  full  fledged  CM;  rather,  the 
task  is  to  find  a  feasible  point  that  is  superior  (with  respect  to  the  objective  func¬ 
tion  value)  to  one  returned  by  a  feasibility-seeking  only  algorithm. 

The  SM  could  be  beneficial  for  a  problem  for  which  an  exact  CM  algorithm  has 
not  yet  been  discovered,  or  when  existing  exact  optimization  algorithms  are  very 
time  consuming  or  require  too  much  computer  resources  for  realistic  large  problems. 
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If,  in  such  cases,  there  exist  (space-  and  time-)  efficient  iterative  feasibility¬ 
seeking  projection  methods  that  provide  non-optimal  but  constraints-compatible 
solutions,  then  they  can  be  turned  by  the  SM  into  methods  that  will  be  practically 
useful  from  the  point  of  view  of  the  function  to  be  optimized.  Examples  of  such 
situations  are  given  in  [H,  I], 

We  associate  with  the  feasible  set  C  a  proximity  function  Proxc  '■  0  — >  R+, 
which  is  an  indicator  of  how  incompatible  a  vector  x  G  0  is  with  the  constraints. 
For  any  given  e  >  0,  a  point  x  £  0  for  which  Proxc{x )  <  e  is  called  an  e-compatible 
solution  for  C.  We  assume  that  we  have  a  feasibility-seeking  algorithmic  operator 
Ac  :  RJ  — >  0,  that  defines  a  Basic  Algorithm  whose  iterative  step,  given  the 
current  iterate  vector  xk,  calculates  the  next  iterate  xk+1  by 

(3.2)  xk+1  =  Ac  (xk)  . 

Given  C  C  RJ ,  a  proximity  function  Proxc ,  a  sequence  {a:fe}^0  C  0  and  an  e  >  0, 
then  an  element  xK  of  the  sequence  which  has  the  properties:  (i)  Proxc  ( xK )  <  e, 
and  (ii)  Proxc  ( xk )  >  e  for  all  0  <  k  <  K,  is  called  an  e-output  of  the  sequence 
{a'fc}feLo  respect  to  the  pair  (C,  Proxc)-  We  denote  it  by  O  (C,e,  {^fe}^0)  = 
xK ,  O  standing  for  output. 

Clearly,  an  e-output  O  (C,e,  {xfc}fe_0)  of  a  sequence  {xk}k  Q  might  or  might 
not  exist,  but  if  it  does,  then  it  is  unique.  If  {xfc}^0  is  produced  by  an  algorithm 
intended  for  the  feasible  set  C,  such  as  the  Basic  Algorithm  (3.2  ),  without  a  termi¬ 
nation  criterion,  then  O  (C,e,  {*fc}fc_0)  is  the  output  produced  by  that  algorithm 
when  it  includes  the  termination  rule  to  stop  when  an  £-compatible  solution  for  C 
is  reached. 

In  order  to  “superiorize”  such  an  algorithm  we  need  it  to  be  strong  perturbation 
resilience  in  the  sense  that  for  every  e  >  0,  for  which  an  e-output  is  defined  for  a 
sequence  generated  by  the  Basic  Algorithm,  for  every  £  0,  we  have  also  that 
the  e'-output  is  defined  for  every  e'  >  e  and  for  every  sequence  {yk}k_0  generated 
by  yk+1  =  Ac  (yk  +  (3kVk)  ,  for  all  k  >  0,  where  the  vector  sequence  {ffe}jL0 
is  bounded  and  the  scalars  {/3k}kL0  are  such  that  f3k  >  0,  for  all  k  >  0,  and 
SfeLo  <  °o-  See  our  recent  [H]  for  details. 

Along  with  the  constraints  set  C  C  RJ ,  we  look  at  the  objective  function 
(f>  :  RJ  — >  R,  with  the  convention  that  a  point  in  RJ  for  which  the  value  of  f>  is 
smaller  is  considered  superior  to  a  point  in  RJ  for  which  the  value  of  (j>  is  larger. 

The  essential  idea  of  the  SM  is  to  make  use  of  the  perturbations  in  order 
to  transform  a  strongly  perturbation  resilient  algorithm  that  seeks  a  constraints- 
compatible  solution  for  C  (i.e. ,  is  seeking  feasibility)  into  one  whose  outputs  are 
equally  good  from  the  point  of  view  of  constraints-compatibility,  but  are  superior 
(not  necessarily  optimal)  according  to  the  objective  function  <)>. 

This  is  done  by  producing  from  the  Basic  Algorithm  another  algorithm,  called 
its  superiorized  version,  that  makes  sure  not  only  that  the  /3ki’k  are  bounded  per¬ 
turbations,  but  also  that  (j>  (jjk  +  (3kVk )  <  (j>  ( yk ),  for  k  >  L  for  some  integer  L  >  0. 
The  Superiorized  Version  of  the  Basic  Algorithm  assumes  that  we  have  available  a 
summable  sequence  {r/e}^L0  of  positive  real  numbers  (for  example,  rp  =  of ,  where 
0  <  a  <  1)  and  it  generates,  simultaneously  with  the  sequence  {2/fc}jL0  in  0,  se¬ 
quences  {ufc}^L0  and  {/3fc}fcLo-  latter  is  generated  as  a  subsequence  of  {?7r}^0> 
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resulting  in  a  nonnegative  summable  sequence  {/3fc}^L0-  The  algorithm  further  de¬ 
pends  on  a  specified  initial  point  y°  £  0  and  on  a  positive  integer  N.  It  makes  use 
of  a  logical  variable  called  loop.  The  superiorized  algorithm  is  presented  next  by 
its  pseudo-code. 

The  Superiorized  Version  of  the  Basic  Algorithm 

set  k  =  0 
set  yk  =  y° 
set  i  =  —  1 
repeat 

set  n  =  0 
set  yk’n  =  yk 
while  n<N 

set  vk,n  to  be  a  nonascending  vector  for  <f>  at  yk’n 
set  loop=true 
while  loop 

set  l  =  t  +  1 
set  —  T]£ 
set  *  =  yk’n  +  (3k,nvk’n 
if  (f>  (z)<(j>  (yk)  then 
set  n=n  + 1 
set  yk’n=z 
set  loop  =  false 
set  yk+1=Ac  ( yk,N ) 
set  k  =  k  +  1 

Analysis  of  the  Superiorized  Version  of  the  Basic  Algorithm  [H,  I],  shows 
that  it  produces  outputs  that  are  essentially  as  constraints-compatible  as  those 
produced  by  the  original  (not  superiorized)  Basic  Algorithm.  However,  due  to  the 
repeated  steering  of  the  process  toward  reducing  the  value  of  the  objective  function 
</>,  we  can  expect  that  the  output  of  the  Superiorized  Version  will  be  superior  (from 
the  point  of  view  of  </>)  to  the  output  of  the  original  algorithm.  A  recent  work  that 

includes  results  about  the  SM  appears  in  this  volume  [D]. 

4.  Demonstrative  examples 

The  anonymized  pelvic  planning  CT  (computed  tomography)  of  a  prostate 
cancer  patient  was  employed  for  the  IMRT  treatment  planning  using  the  proposed 
method.  Seven  equispaced  fields  were  used  for  targeting  the  PTV.  The  dose  con¬ 
straints  were  set  using  the  RTOG  0815  randomized  trial  protocol  [J]. 

Our  preliminary  testing  of  the  approach  was  done  by  comparing  the  outputs  of 
a  TV-superiorization  algorithm  with  an,  otherwise  identical,  algorithm  that  aimed 
at  only  satisfying  the  dose  constraints,  without  applying  the  SM.  Here  Ac  was 
chosen  to  be  ART  for  inequalities  [K].  It  was  proven  to  be  perturbation  resilient 
in  [L], 

From  a  radiation  delivery  stand  point,  a  solution  that  is  easy  to  deliver  is  one 
that  has  a  piecewise  constant  intensety-beamlet  map.  The  reason  has  to  do  with 
the  physical  constraints  coming  from  the  MLCs,  they  require  that  the  beamlets 
have  a  small  number  of  signal  levels.  It  was,  therefore,  suggested  in  the  literature 
to  use  total- variation  (TV)  to  force  the  solution  to  be  piecewise  constant  [M,  N]. 
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We  performed  two  experiments  with  different  starting  conditions.  For  the  first 
experiment,  we  initiated  the  algorithm  with  the  zero  vector  of  dose  weights  and 
for  the  second  experiment  all  dose  weights  were  given  the  value  10.  Tables  1  and 
2  summarize  the  results  for  the  two  experiments  and  in  Figure  1  we  present  the 
associated  DVH  (dose- volume  histogram)  curves. 

For  the  first  experiment,  the  TV-superiorization  algorithm  produced  a  solution 
that  met  the  acceptance  criteria  after  12  iterations  whereas  the  feasibility-seeking 
algorithm  was  not  able  to  reach  an  acceptable  solution  after  this  number  of  iter¬ 
ations.  For  the  second  experiment,  the  TV-superiorization  algorithm  reached  an 
acceptable  solution  even  faster,  i.e. ,  after  7  iterations,  and  the  feasibility-seeking  al¬ 
gorithm  again  failed  some  of  the  acceptance  criteria  after  this  number  of  iterations. 

Table  1.  RTOG  0815  acceptance  criteria  and  results  of  experi¬ 
ment  1  described  in  Section  4  (TVS  stands  for  TV-superiorization) 


Acceptance  criteria 

Exp  1  with  TVS 

Exp  1  without  TVS 

PTV:  Min  Allowed  Dose:  75.24  Gy 

75.24  Gy 

56.13  Gy 

PTV:  Max  Allowed  Dose:  84.74  Gy 

84.69  Gy 

89.42  Gy 

Rectum:  No  more  than  50%  of  the 

34.50  % 

8.50  % 

volume  should  exceed  60.00  Gy 

Rectum:  Max  Dose 

82.64  Gy 

82.71  Gy 

Table  2.  RTOG  0815  acceptance  criteria  and  results  of  experi¬ 
ment  2  described  in  Section  4  (TVS  stands  for  TV-superiorization) 


Acceptance  criteria 

Exp  2  with  TVS 

Exp  2  without  TVS 

PTV:  Min  Allowed  Dose:  75.24  Gy 

77.80  Gy 

76.15  Gy 

PTV:  Max  Allowed  Dose:  84.74  Gy 

84.71  Gy 

87.63  Gy 

Rectum:  No  more  than  50%  of  the 

36.90  % 

40.50  % 

volume  should  exceed  60.00  Gy 

Rectum:  Max  Dose 

84.09  Gy 

87.25  Gy 

5.  Conclusions 

Our  proposed  method  successfully  produced  conformal  solutions  that  met  the 
acceptance  criteria  while  that  an  otherwise  identical  algorithm  without  superior- 
ization  failed  to  do  so  with  the  same  number  of  iterations.  Future  work  will  assess 
the  computational  gain  of  the  superiorization  method  compared  to  a  conventional 
one  and  investigate  the  utility  of  it  for  a  computationally  more  complex  problems 
that  can  be  found  in  modulated  techniques  for  arc  therapy. 
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Figure  1.  Dose  volume  histograms  (DVH)  of  the  two  experi¬ 
ments.  Solid  lines  represent  the  algorithm  with  TV-superiorization 
(broken  lines  represent  no  superiorization).  The  first  (top)  took  12 
iterations  and  the  second  (bottom)  took  7  iterations.  Exact  num¬ 
bers  are  given  in  Tables  1  and  2. 
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The  problem  of  reconstruction  of  slices  and  volumes  from  ID  and  2D  projections  has 
arisen  in  a  large  number  of  scientific  fields  (including  computerized  tomography,  electron 
microscopy,  X-ray  microscopy,  radiology,  radio  astronomy  and  holography).  Many  different 
methods  (algorithms)  have  been  suggested  for  its  solution. 

In  this  paper  we  present  a  software  package,  SNARK09,  for  reconstruction  of  2D  images 
from  their  ID  projections.  In  the  area  of  image  reconstruction,  researchers  often  desire  to 
compare  two  or  more  reconstruction  techniques  and  assess  their  relative  merits.  SNARK09 
provides  a  uniform  framework  to  implement  algorithms  and  evaluate  their  performance.  It 
has  been  designed  to  treat  both  parallel  and  divergent  projection  geometries  and  can  either 
create  test  data  (with  or  without  noise)  for  use  by  reconstruction  algorithms  or  use  data 
collected  by  another  software  or  a  physical  device.  A  number  of  frequently-used  classical 
reconstruction  algorithms  are  incorporated.  The  package  provides  a  means  for  easy  incor¬ 
poration  of  new  algorithms  for  their  testing,  comparison  and  evaluation.  It  comes  with  tools 
for  statistical  analysis  of  the  results  and  ten  worked  examples. 

©  2013  Elsevier  Ireland  Ltd.  All  rights  reserved. 


1.  Introduction 

The  need  for  reconstruction  from  projections  occurs  in  many 
biomedical  areas.  Practical  applications,  such  as  computer¬ 
ized  tomography  (CT),  positron  emission  tomography  (PET) 
and  X-ray  microscopy  use  physically  collected  projection  data 
to  reconstruct  real  objects.  Simulation  packages  (SNARK09 
[1,2]  is  an  example)  allow  for  thorough  testing  of  effects  of 
various  factors  on  the  projection  data  and  on  the  outputs  of 
reconstruction  algorithms.  For  example,  in  SNARK09,  various 
sources  of  noise  that  occurs  during  X-ray  data  collection  can  be 
simulated  separately  so  that  their  effects  can  be  studied  and 
understood.  (The  name  SNARK09  originates  from  the  Lewis 
Carroll  nonsense  poem  “The  Hunting  of  the  Snark.”) 


SNARK09  provides  a  total  framework  for  reconstruction 
from  projections  for  both  simulated  and  real  data,  as  well 
as  statistical  evaluation  of  the  results.  Mathematical  phan¬ 
toms  can  be  generated  either  as  piecewise  constant  objects, 
appropriate  for  materials  science,  or  as  objects  containing 
inhomogeneities  to  better  simulate  biological  materials.  Pro¬ 
jection  datasets  can  be  obtained  based  on  mathematically 
described  phantoms.  The  user  has  options  of  investigating 
various  scanner  modes,  including  noise  models  comparable 
to  actual  imaging  devices.  There  is  also  an  option  of  using 
projection  data  obtained  from  external  sources  (a  medical 
scanner  or  data  generated  by  other  software).  The  package 
comes  with  several  built-in  reconstruction  algorithms.  It  pro¬ 
vides  either  pixels  or  blobs  as  basis  functions.  Users  have  a 
means  of  implementing  their  own  reconstruction  algorithms. 
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The  results  of  the  reconstructions  can  be  evaluated  using  a 
statistically  sound  methodology  built  into  the  package. 

1.1.  Features  of  the  package 

Below  we  give  a  short  summary  of  some  features  included  in 
SNARK09.  This  is  not  intended  to  be  a  complete  list,  but  rather 
a  representative  sample  of  what  SNARK09  has  to  offer. 

Polychromatic  and  monochromatic  X-ray  simulation.  The 
package  provides  a  means  of  simulating  X-rays  with  either 
a  monochromatic  or  polychromatic  spectrum  at  energy  lev¬ 
els  specified  by  the  user.  X-rays  generated  by  most  medical 
imaging  devices  are  polychromatic  in  nature. 

Beam  hardening  correction.  Due  to  the  polychromatic  nature 
of  X-rays,  beam  hardening  correction  is  needed  to  compensate 
for  different  levels  of  X-ray  attenuation.  Such  a  correction  is 
available  in  SNARK09. 

Projection  computation.  The  projection  data  through  the 
phantom  can  be  computed  either  based  on  a  mathematically 
defined  phantom  (line  integrals  through  geometrical  features), 
or  a  digitized  phantom  (using  lengths  of  intersections  of  a  ray 
with  each  pixel  of  the  phantom). 

Digital  difference  analyzer  (DDA).  Projections  through  digi¬ 
tized  phantoms  can  be  computed  very  fast  using  a  DDA.  This 
method,  originally  developed  for  drawing  lines  using  a  digital 
plotter  [3],  is  used  for  computation  of  pixel  intersections  by  a 
single  line. 

Basis  functions.  Both  pixels  and  blobs  [4,5]  can  be  used  as 
the  basis  functions  for  mathematical  representation  of  the 
reconstructions.  Blobs  have  been  shown  to  be  superior  for  rep¬ 
resentation  of  biological  structures  due  to  their  smoothness. 

Reconstruction  algorithms.  The  package  provides  a  large 
selection  of  reconstruction  algorithms  based  on  transform 
methods  and  series  expansion  methods  with  parameters 
selected  by  the  user.  User-defined  reconstruction  algorithms 
can  also  be  easily  implemented  and  used  in  SNARK09.  The 
code  for  these  algorithms  has  to  be  written  in  C++.  The  pro¬ 
grammers  have  at  their  disposal  a  selection  of  functions  and 
objects  already  implemented  in  SNARK09. 

Routines  and  classes  for  use  in  user-defined  algorithms. 
Researchers  who  need  to  implement  their  own  reconstruction 
algorithms  have  a  large  range  of  routines  and  classes  available 
for  their  use.  One  example  is  a  function  that,  for  a  given  projec¬ 
tion  angle  and  ray  number,  computes  the  pixels  intersected  by 
that  ray.  This  greatly  simplifies  implementation  of  additional 
algorithms. 

Statistical  comparison  of  algorithms.  The  built-in  and  user- 
defined  algorithms  can  be  evaluated  for  their  superiority  for 
a  given  task.  SNARK09  uses  an  ensemble  of  phantoms  from 
which  a  particular  phantom  is  (randomly)  chosen  and  is  recon¬ 
structed  from  its  (randomly  generated)  projection  data  by  the 
algorithms  to  be  compared.  Statistical  evaluation  is  performed 
based  on  this  multiplicity  of  reconstructions,  which  allow  us 
to  assign  a  statistical  significance  by  which  we  can  reject  the 
null  hypothesis  that  two  algorithms  are  equally  good  in  favor 
of  the  alternative  hypothesis  that  one  of  them  is  better  than 
the  other. 

Figure  of  merit  (FOM).  A  meaningful  statistical  evaluation 
of  reconstruction  methods  has  to  be  done  in  terms  of  a  spe¬ 
cific  task  at  hand.  The  package  provides  several  built-in  FOMs, 


as  well  as  a  means  for  the  user  to  provide  their  own  defini¬ 
tions.  One  of  the  built-in  FOMs  is  imagewise  region  of  interest, 
which  has  been  shown  to  correlate  well  with  the  performance 
of  humans  for  detection  of  small  tumors  in  lung  tissue  [6]. 

Use  of  simulated  or  real  data.  The  package  can  either  sim¬ 
ulate  the  data  generation  based  on  a  mathematically  defined 
phantom,  or  use  data  obtained  by  another  simulation  package 
or  an  actual  device. 

Graphical  user  interface.  SNARK09  runs  from  the  com¬ 
mand  line.  There  are  two  other  programs,  SNARK09Input  and 
SNARK09Display,  that  provide  assistance  in  creation  of  input 
files  and  visualization  of  sinograms  and  of  reconstructions. 
SNARK09Display  has  also  the  capability  of  displaying  profile 
lines  of  the  reconstructed  images  as  well  as  plotting  several 
built-in  evaluation  parameters. 

1.2.  Related  work 

There  are  many  packages  available  for  reconstruction  from 
projections.  Some  of  them,  like  SNARK09,  are  designed  to  work 
with  2D  images  and  their  ID  projections;  some  are  designed 
to  work  with  3D  objects  and  their  2D  projections;  some  can 
do  both.  The  packages  provide  a  varied  selection  of  simu¬ 
lation  capabilities,  choices  of  reconstruction  algorithms  and 
evaluation  techniques.  Some  are  developed  for  the  purpose  of 
reconstruction  from  real  data  and  for  obtaining  results  used  in 
practical  studies  in  medicine  or  biology.  Others,  like  SNARK09, 
provide  means  of  evaluation  and  examination  of  different 
effects  that  occur  during  the  imaging  process  and  are  testbeds 
for  new  and  existing  reconstruction  methodologies.  Below,  we 
mention  a  few  of  these  packages  that  we  are  most  familiar 
with  as  examples;  it  is  by  no  means  a  complete  review  of  what 
is  available. 

jSNARK  [7]  incorporates  a  large  subset  of  the  capabilities  of 
SNARK09  for  reconstructing  2D  objects  from  their  ID  projec¬ 
tions,  but  it  also  extends  those  capabilities  to  reconstructing 
3D  objects  from  their  2D  projections.  It  is  written  in  Java 
allowing  for  greater  platform  flexibility  and  for  user-defined 
routines  written  as  plugins. 

There  are  many  software  packages  designed  primarily  for 
reconstruction  from  transmission  electron  microscopy  data. 
Their  goal  is  to  produce  reconstructions  that  can  be  used 
by  biologists,  rather  than  to  allow  experimentation  with  new 
algorithms.  Three  examples  of  such  packages  are  Xmipp  [8], 
SPIDER  [9]  and  IMOD  [10], 

MATLAB®  provides  some  very  basic  tomography  routines 
and  phantoms.  Many  researchers  write  their  own  routines 
using  MATLAB®  environment.  Two  examples  of  such  code 
available  for  free  download  are  the  Image  Reconstruction  Tool¬ 
box  [11]  and  AIR  Tools  [12]  that  implement  iterative  algebraic 
reconstruction  methods. 

STIR  [13]  is  open-source  software  for  use  in  tomographic 
imaging.  Its  aim  is  to  provide  a  multi-platform  object-oriented 
framework  for  all  data  manipulations  in  tomographic  imaging. 
Currently,  the  emphasis  is  on  iterative  image  reconstruction 
in  positron  emission  tomography. 

SNARK09  offers  much  more  than  just  the  reconstruction 
routines.  It  provides  a  basic  platform  that  can  be  used  by 
researchers  (with  no  or  not  much  extra  coding)  to  study, 
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simulate  and  perform  data  collection,  reconstruction  and 
statistical  analysis. 

1.3.  Outline  of  the  rest  of  this  paper 

In  this  paper  we  review  the  functionality  of  the  SNARK09  soft¬ 
ware  package.  We  discuss  phantom  creation,  data  collection 
and  reconstruction  methods  in  Section  2.  The  system  descrip¬ 
tion  follows  in  Section  3.  We  go  over  an  example  of  the  use  of 
the  package  in  Section  4.  Finally,  mode  of  availability  and  sys¬ 
tem  requirements  are  covered  in  Section  5  and  future  work  in 
Section  6. 


2.  Computational  methods  and  theory 

The  reconstruction  problem  may  be  stated  roughly  as  follows: 
given  approximations  (based  on  physical  measurements)  of 
the  real  ray  sums  of  a  picture  for  a  number  of  rays,  estimate 
the  N  x  N  digitization  of  the  picture.  The  SNARK09  package 
provides  a  means  of  simulating  each  step  of  this  process.  The 
details  of  this  are  discussed  in  this  section. 

SNARK09  deals  with  pictures  defined  over  the  2D  plane  of 
points  (x,  y)  in  some  assumed  fixed  coordinate  system.  To  be 
exact,  a  picture  has  two  components: 

(1)  the  picture  region,  which  is  a  square  whose  center  is  at 
the  origin  of  the  coordinate  system  and  whose  sides  are 
parallel  to  its  axes; 

(2)  a.  function  of  two  variables  whose  value  is  zero  outside  the 
picture  region. 

Identical  functions  may  give  rise  to  different  pictures  if  the 
picture  regions  are  different. 

We  often  refer  to  the  value  f(x,  y)  of  the  picture  /  at  the 
point  (x,  y)  as  the  density  of/ at  (x,  y).  Within  SNARK09,/(x,  y) 
is  approximated  by  a  grid  G  (which  is  a  finite  set  {(gi,  hi),  . . ., 
(gj,  h;)}  of  points  in  the  plane),  a  basic  basis  function  b  (which  is 
just  a  function  of  two  variables),  and  coefficients  C;  associated 
with  each  grid  point  (gj,  hj)  as  follows.  For  1  <  j  <],  each  of  the 
functions 

bj(x,y)  =  b(x-gj,  y-hj)  (1) 

is  called  a  basis  function  and  /is  defined  by 


i.e.,  as  an  expansion  over  the  basis  functions  b,  with 
coefficients  cj.  SNARK09  allows  the  use  of  two  different  kinds 
of  basis  functions:  pixels  and  blobs  [4,5],  each  with  its  own 
type  of  grid. 

The  exact  definition  of  a  pixel  basis  function  depends  on 
a  variable  called  PIXSIZ  that  is  specified  by  the  input  to 
SNARK09.  The  pixel  basis  function  is  then  defined  to  have  the 
value  1  at  points  strictly  inside  the  square  that  is  centered 
at  the  origin  and  that  has  edges  of  length  PIXSIZ  parallel  to 
the  coordinate  axes,  and  to  have  the  value  0  at  points  strictly 
outside  this  square.  The  associated  grid  G  is  defined,  by  an 
additional  input-specified  variable  called  NELEM,  to  be  the  set 


{(m  x  PIXSIZ,  n  x  PIXSIZ)  |  m  and  n  integers,  max{|m|,  |n|) 

<  NELEM/2},  (3) 

where  |  ■  |  denotes  the  absolute  value.  This  approach  subdi¬ 
vides  the  picture  region  into  NELEM2  equal  squares.  Each  of 
these  smaller  squares  is  called  a  pixel  (short  for  picture  ele¬ 
ment).  In  the  interior  of  a  pixel,  the  density  of  the  function,  as 
defined  by  (2),  is  uniform.  An  arbitrary  picture  can  be  approx¬ 
imated  by  such  an  expansion  by  simply  assigning  to  each  c, 
the  average  density  of  the  picture  in  the  corresponding  pixel; 
such  an  approximation  is  referred  to  as  the  NELEM2  digitization 
of  the  picture.  In  SNARK09,  the  picture  region  (which  is  some¬ 
times  referred  to  as  the  reconstruction  region )  is  determined  by 
the  program  (based  on  the  input-specified  variables  PIXSIZ 
and  NELEM)  as  the  square  whose  corners  have  coordinates 
(c,  c),  (-  c,  c),  (-  c,  -  c),  (c,  -  c),  where 

PIXSIZ  x  NELEM 

c= - 2 - '  (4) 

A  blob  basis  function  also  depends  on  some  input-specified 
variables;  however,  SNARK09  will  automatically  calculate  rea¬ 
sonable  values  for  these  parameters,  relieving  the  user  from 
the  need  of  having  to  study  the  mathematical  definitions  that 
now  follow.  Blob  basis  functions  are  generalizations  of  the  well- 
known  window  functions  in  digital  signal  processing  called 
Kaiser-Bessel  [4,5];  they  are  circularly  symmetric,  have  nonzero 
values  only  in  a  circular  disk  around  the  origin,  and  smoothly 
decrease  from  a  positive  value  at  the  origin  to  zero  at  the  edge 
of  the  disk.  The  exact  definition  depends  on  the  three  variables 
SUPPORT,  DELTA  and  SHAPE  as  follows: 


V3  x  DELTA2  x  SHAPE  x  (1  -  r2) 
4 JT  X  SUPPORT2  x  I3(SHAPE) 


x  I2 (SHAPE  x  \J  1  -  r2), 


0, 


if  0  <  r  <  1, 
otherwise, 


(5) 


1 

/(*,y)  =  y); 

j=i 


where  I;  denotes  the  modified  Bessel  function  of  the  first 
kind  of  order  i  and  r  =  \/x2  +  y2/SUPPORT.  The  grid  G  that 
(2)  determines  the  blob  basis  functions  using  (5)  is  hexag¬ 
onal  [14];  it  is  defined  as  the  set  of  all  points  in  the 
set 


m  and  n  are  integers,  andm  +  nis  even 


m  V3  n  \ 

—  x  DELTA,  — -  x  DELTA 

2  2/ 


(6) 
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that  are  also  inside  the  reconstruction  region  specified 
by  (4). 

2.1.  Creation  of  a  phantom 

In  practical  applications  one  wishes  to  reconstruct  a  real 
object  from  its  projections.  During  the  development  of  recon¬ 
struction  methods,  though,  it  is  preferable  to  work  with 
mathematically  described  objects,  called  phantoms.  The  rea¬ 
son  for  this  is  that  with  real  objects  evaluation  of  the  accuracy 
of  a  reconstruction  method  is  practically  impossible.  The  pur¬ 
pose  of  imaging  and  reconstruction  is  to  visualize  an  object 
that  cannot  be  seen  otherwise  (due  to  its  size  or  because  the 
internal  structure  is  desired,  for  example,  the  internal  struc¬ 
tures  of  a  cell  or  a  cross  section  of  a  human  head).  Use  of 
mathematically  defined  phantoms  allows  for  evaluation  of 
quality  of  reconstruction  because  these  objects  are  known. 
Computer  simulations  with  phantoms  also  allow  for  inves¬ 
tigation  of  various  phenomena  occurring  during  both  imaging 
and  reconstruction  separately  from  any  other  phenomena. 

A  phantom  is  a  picture  on  which  we  wish  to  test  recon¬ 
struction  algorithms  or  data  collection  methods.  In  SNARK09 
the  phantom  is  put  together  by  superimposing  a  number  of 
elemental  objects.  There  are  five  different  types  of  elemental 
objects  available:  rectangles,  ellipses,  isosceles  triangles,  seg¬ 
ments  of  circles  and  sectors  of  circles.  The  elemental  objects 
are  illustrated  in  Fig.  1.  Each  elemental  object  is  described  by 
its  position  in  the  plane  (denoted  by  CX,  CY),  size  along  the 
two  perpendicular  directions  (denoted  by  U,  V),  orientation 
(denoted  by  ANG)  and  density  (which,  for  example,  in  case 
of  computerized  tomography  represents  the  linear  attenua¬ 
tion  coefficient).  They  are  allowed  to  overlap,  in  which  case 
the  densities  of  overlap  areas  are  added  together.  Examples 


Fig.  1  -  Elemental  objects  used  to  construct  phantoms  in 
SNARK09:  rectangle  (rect),  ellipse  (elip),  triangle  (tria), 
segment  of  a  circle  (segm)  and  sector  of  a  circle  (sect). 


b 


I  i 

v  J 


Fig.  2  -  (a)  CT  image  of  a  cross  section  of  the  human  thorax 
(reproduced  with  permission  from  [15]).  (b)  255  x  255 
digitization  of  a  thorax  phantom  created  based  on  real 
cross  sections  such  as  the  one  shown  in  (a). 


of  phantoms  that  can  be  created  from  these  simple  elemen¬ 
tal  objects  are  shown  in  Fig.  2(b)  (this  is  a  phantom  based  on 
images,  such  as  the  one  shown  in  Fig.  2(a)  from  [15],  of  an 
actual  cross  section  of  the  human  thorax;  its  exact  specifica¬ 
tion  is  given  in  Table  1),  Fig.  3(a)  (which  is  based  on  a  phantom 
in  [16]  designed  for  testing  algorithms  with  data  available  only 
from  a  limited  angular  range)  and  Fig.  3(b)  (which  is  a  head 
phantom  from  [17]).  The  widely  used  Shepp-Logan  phantom 
[18]  can  also  be  described  using  the  elemental  objects  provided 
by  SNARK09. 

The  N  x  N  digitization  of  the  phantom  is  an  N  x  N  array  of 
pixels,  where  N  is  a  user-specified  integer.  Each  pixel’s  den¬ 
sity  is  determined  as  the  average  of  the  densities  at  K  x  K 
uniformly-spaced  points  within  a  pixel.  K  is  a  user-specified 
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Table  1  -  Basic  thorax  phantom  description  used  in  SNARK09,  see  Fig.  2(b).  The  linear  attenuation  coefficients  (LAC)  in 
this  table  are  specified  for  a  single  energy  level  of  60  keV.  To  obtain  the  actual  LAC  at  a  point,  the  LACs  provided  in  the 
table  should  be  added  together  for  all  objects  that  contain  that  point.  For  example,  the  lungs,  which  are  inside  the 
smaller  ellipse  specified  for  the  thorax  (second  row),  have  LAC  value  0.196  -  0.147  =  0.049;  this  is  in  units  of  cm  1. 
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integer.  As  the  value  of  K  increases,  the  digitized  phantom 
resembles  the  mathematical  phantom  more  closely.  The  den¬ 
sity  assigned  to  a  pixel  can  be  expressed  as  a  sum 

1  K2  J 
k=i  j=i 

where  J  is  the  number  of  elemental  objects  in  the  phantom,  dj 
is  the  density  of  the  jth  elemental  object,  and  <5fe j  =  1  if  the  feth 
of  the  K2  points  in  the  pixel  is  in  the  jth  elemental  object  and 
it  is  zero  otherwise.  Two  possible  digitizations,  with  K  =  1  and 
K  =  13,  of  a  disk  phantom  are  shown  in  Fig.  4. 

In  order  to  obtain  phantoms  that  resemble  actual  biological 
objects  more  closely,  SNARK09  has  an  option  of  adding  local 
inhomogeneities  to  phantoms.  Using  locally  piecewise  phan¬ 
toms  may  lead  to  misleading  conclusions  about  the  efficacy 
of  an  algorithm  in  practice,  because  biological  structures  are 
far  from  being  piecewise  constant  [19].  Examples  of  a  piece- 
wise  constant  phantom  and  one  with  local  inhomogeneities 
are  presented  in  Fig.  3. 

Phantoms  in  SNARK09  can  carry  information  about  atten¬ 
uation  at  different  energy  levels  of  polychromatic  X-ray 


radiation.  The  way  such  information  is  made  use  of  is  dis¬ 
cussed  in  Section  2.2.3. 

SNARK09  is  not  designed  to  work  with  “real”  images,  like 
the  one  in  Fig.  2(a).  One  could  create  a  phantom  that  consists 
of  as  many  small  squares  as  there  are  pixels  in  such  image 
with  the  densities  corresponding  to  the  grayscale  values  in  the 
image.  This  is  not  efficient  and  not  advisable,  and  produces 
data  biased  toward  reconstructions  based  on  the  pixelized 
images. 

2.2.  Data  collection 

SNARK09  is  capable  of  simulating  several  modes  of  data  col¬ 
lection  used  in  various  applications  such  as  computerized 
tomography  (CT)  and  positron  emission  tomography  (PET). 
We  first  describe  some  general  ideas  and  then  specify  how 
SNARK09  simulates  the  CT  and  PET  modes  of  data  collection. 
These  simulations  are  based  closely  on  the  actual  behavior  of 
the  instruments  (see,  for  example,  [17,  Chapter  4]  for  CT  and 
[20]  for  PET)  and  thus  provide  us  with  a  tool  capable  of  pre¬ 
dicting  the  performance  of  such  instruments  in  practice.  As 
explained  below  (in  particular  in  Section  3.1.1),  once  a  pro¬ 
jection  data  set  is  generated,  it  is  stored  internally  (together 
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Fig.  3  -  Examples  of  phantoms:  (a)  241  x  241  digitization  of 
a  phantom  based  on  [16].  It  was  designed  for  testing 
algorithms  with  data  available  only  from  a  limited  angular 
range,  (b)  245  x  245  digitization  of  a  head  phantom  from 
[17],  with  local  inhomogeneities  present. 


with  all  the  information  that  was  utilized  for  its  generation)  to 
be  used  repeatedly  by  other  processes  such  as  reconstruction 
algorithms. 

2.2.1.  Computation  of  ray  sums 

Projection  data  collection  is  simulated  by  computation  of 
approximate  line  integrals  through  the  image  according  to  the 
options  indicated  in  the  input  file.  The  actual  line  integrals 
are  approximated  by  summations  of  products  of  densities  and 
lengths  of  intersection  of  all  the  elemental  objects  intersected 
by  the  given  ray.  There  are  two  kinds  of  rays  available: 

(1)  a  line  ray,  which  is  a  straight  line,  and 

(2)  a  strip  ray,  which  is  a  region  of  the  plane  between  a  pair  of 
parallel  straight  lines. 

Given  a  picture  and  a  ray,  the  real  ray  sum  is  the  integral  of  the 
picture  along  the  ray  (either  a  line  or  a  strip).  This  is  computed 
using  the  geometric  description  of  the  elemental  objects,  not 
the  digitized  version  of  the  phantom. 

SNARK09  also  deals  with  pseudo  ray  sums;  these  are  defined 
only  for  expansions  of  the  form  of  Eq.  (2).  In  the  line  ray  case, 
the  pseudo  ray  sum  is  the  real  ray  sum  of  the  picture  defined  by 
the  function/of  Eq.  (2)  (it  uses  the  intersections  of  the  ray  with 
the  basis  function,  either  pixel  or  blob)  and  a  picture  region 
large  enough  to  contain  all  points  at  which  the  value  of  /  is 
not  zero.  (Since  the  grid  G  is  finite,  see  paragraph  above  Eq.  (1), 
it  is  always  possible  to  find  such  a  picture  region.)  In  the  case 
of  strip  rays,  the  pseudo  ray  sum  is  defined  as 


x  /  /  b(x,  yjdxdy,  (8) 

J  —  oo  J  — oo 

where  S  contains  exactly  those  j  (1  <  j  <J)  for  which  (gp  h,)  is  in 
the  strip.  Note  that  the  integral  in  the  above  equation  depends 
only  on  the  basic  basis  function;  it  is  equal  to  the  area  of  the 
pixel  in  the  pixel  case  and  area  under  the  curve  of  the  blob  in 
the  blob  case. 

Pseudo  ray  sums  are  used  in  iterative  reconstruction 
algorithms  in  which  the  objective  of  the  algorithm  is  the 
estimation  of  the  coefficients  Cj  in  Eq.  (2).  When  SNARK09  is 
used  for  simulating  how  a  physical  device  produces  projection 
data,  real  ray  sums  should  be  used  to  obtain  high  accuracy. 
Depending  on  the  parameters  of  the  basis  functions,  pseudo 
ray  sums  may  be  very  inaccurate. 


Fig.  4  -  Two  25  x  25  digitizations  of  a  mathematically  described  disk  phantom,  using  (a)  K  =  1  and  (b)  K  =  13. 
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2.2.2.  Geometry  of  data  collection 

SNARK09  is  not  capable  of  handling  an  arbitrary  arrangement 
of  rays,  but  it  can  handle  a  number  of  arrangements  of  rays 
that  are  typical  of  what  one  might  come  across  in  practice. 

The  set  of  all  rays  along  which  data  are  collected  is  divided 
into  a  number  of  subsets,  called  projections.  The  number  of 
projections  and  number  of  rays  per  projection  are  user-defined 
values,  although  SNARK09  can  compute  the  number  of  rays 
sufficient  to  cover  the  entire  area  of  the  phantom.  There  are 
two  basically  different  modes  of  data  collection:  divergent  and 
parallel. 

In  divergent  geometry  (Fig.  5(a)),  a  projection  consists  of  a  set 
of  line  rays  that  go  through  a  common  point  (the  source  posi¬ 
tion).  In  all  the  projections,  the  source  is  at  a  fixed  distance 
from  the  origin.  The  angle  between  the  line  from  origin  to 
source  and  the  x-axis  (marked  THETA  in  Fig.  5(a))  is  called 
the  projection  angle.  The  rays  in  one  projection  connect  the 
source  to  points  (detectors)  that  lie  either  on  an  arc  of  a  cir¬ 
cle  whose  center  is  at  the  source  or  on  a  straight  line  tangent 
to  that  circle.  In  either  case,  one  of  the  detectors  (marked  C  in 
Fig.  5(a))  lies  on  the  line  connecting  the  source  to  the  origin, 
at  a  distance  STOD  (for  Source  TO  Detector)  from  the  source. 
The  other  detectors  are  spaced  symmetrically  at  equal  inter¬ 
vals  on  the  two  sides  of  C,  either  on  the  arc  whose  center  is 
the  source  or  on  the  tangent  line  to  this  arc  at  C.  The  spacing 
between  detectors  (the  length  of  the  arc  or  that  of  the  tangent 
line  between  two  neighboring  detectors)  is  denoted  by  PINC 
(specified  by  the  user  in  the  input  file  to  SNARK09). 

In  parallel  geometry  (Fig.  5(b)),  a  projection  consists  of  a  set  of 
parallel  line  or  strip  rays.  The  angle  these  rays  make  with  the 
x-axis  (denoted  by  THETA  in  Fig.  5(b))  is  called  the  projection 
angle.  In  the  line  case  one  of  the  rays  goes  through  the  ori¬ 
gin,  in  the  strip  case  the  origin  is  equidistant  to  the  two  lines 
bounding  one  of  the  strips.  The  other  rays  are  spaced  sym¬ 
metrically  at  equal  intervals  on  the  two  sides  of  this  ray.  In 
the  strip  case  the  rays  are  abutting.  Let  d  denote  the  distance 
between  the  rays  in  the  line  case  or  the  width  of  the  rays  in  the 
strip  case;  it  is  determined  as  follows.  The  input  specifies  the 
variable  PINC  and  also  whether  the  ray  spacing  is  to  be  uniform 
or  variable.  In  the  uniform  case,  d  =  PINC,  for  all  projections.  In 
the  variable  case  it  depends  on  the  projection  angle  THETA: 
d  =  PINC  x  max{|sinTHETA|,  |cosTHETA|}.  (A  consequence  of 
this  definition  is  that  the  distance  between  two  consecutive 
intercepts  with  either  the  x-  or  the  y-axis  is  PINC.) 

2.2.3.  Simulating  CT 

Computerized  tomography  (CT)  is  a  method  of  imaging  the 
interior  of  an  object  (frequently  a  human  body)  based  on  the 
measurements  of  X-ray  radiation  that  passes  through  that 
object.  The  density  values  assigned  to  elemental  objects  in  the 
phantom  are  interpreted  as  attenuation  coefficients  of  the  ele¬ 
mental  object.  The  imaging  of  three-dimensional  (3D)  objects 
can  be  done  in  thin  sections  that  are  in  practice  considered  to 
be  two-dimensional  (2D)  images.  Each  measurement  is  related 
to  the  X-ray  source  and  detector  positions  lying  in  the  plane 
of  such  section.  For  each  pair  of  source  and  detector  positions 
two  measurements  are  taken:  a  calibration  measurement  and 
an  actual  measurement.  A  calibration  measurement  is  taken 
without  the  object,  only  through  the  background  material.  An 
actual  measurement  is  taken  through  the  object.  Ideally,  the 


/ 


(a) 


Fig.  5  -  Schematic  of  geometry  of  data  collection  in  CT:  (a) 
divergent  and  (b)  parallel. 


X-ray  spectrum  would  have  a  fixed  energy  level  (monochro¬ 
matic  X-rays),  which  means  that  each  point  has  a  uniquely 
assigned  attenuation  coefficient.  In  practice,  X-rays  are  made 
up  of  a  continuous  energy  spectrum  and  this  can  be  simulated 
in  SNARK09.  When  this  is  done,  attenuation  of  the  X-ray  beam 
at  a  point  depends  on  the  material  traversed  through  by  the 
beam  prior  to  reaching  that  point,  because  more  lower  energy 
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photons  get  absorbed  by  that  material  than  higher  energy  pho¬ 
tons  (this  is  referred  to  as  beam  hardening ). 

SNARK09  is  capable  of  simulating  both  monochromatic  and 
polychromatic  X-rays.  The  polychromaticity  of  the  X-ray  is 
represented  by  up  to  seven  discrete  energy  levels.  The  ray 
sum  for  a  fixed  pair  of  source-detector  positions  is  defined 
by  p  =  -ln(A/C),  where  A  and  C  are  actual  and  calibration 
measurements,  respectively.  The  set  of  p  values  for  all  source- 
detector  pairs  is  called  the  projection  data.  For  polychromatic 
simulation  the  phantom  description  contains  a  list  of  linear 
attenuation  coefficients  corresponding  to  each  discrete  energy 
level.  For  the  display  purposes,  the  image  of  the  phantom  is 
generated  based  on  linear  attenuation  coefficients  of  only  one 
energy  level. 

When  CT  data  collection  is  simulated  in  SNARK09,  the 
values  of  ray  sums  are  used  in  computations  of  A.  Accord¬ 
ing  to  options,  specified  in  the  input  file,  the  simulated  data 
reflects  effects  of  beam  hardening,  detector  width  and  scatter, 
quantum  noise  and  various  scanning  modes  (for  a  detailed 
discussion  of  these  effects  see,  for  example,  [17]). 

2.2.4.  Simulating  PET 

In  positron  emission  tomography  (PET)  we  are  interested  in  the 
uptake  of  positron-emitting  isotopes  by  various  parts  of  the 
human  body.  When  a  positron  is  emitted  it  is  annihilated  with 
a  nearby  electron  and  produces  two  y-ray  photons  of  identi¬ 
cal  energy  traveling  in  approximately  opposite  directions  [20, 
Fig.  1].  The  two  photons  are  detected  in  near  coincidence  by 
a  pair  of  opposite  detectors.  The  annihilation,  and  thus  the 
positron  emission,  is  known  to  take  place  somewhere  along 
the  line  joining  the  detector  pair  [20,  Fig.  2],  We  count  such 
coincidences  for  a  number  of  detector  pairs  around  the  body. 
From  these  measured  counts  our  aim  is  to  estimate  the  con¬ 
centration  of  the  positron  emitter  at  various  points  in  the  body 
cross-section. 

Fig.  6(a)  shows  a  simplification  of  PET  geometry  [20,  Fig.  2] 
consisting  of  a  ring  of  eight  detectors.  For  simplicity  of  the 
illustration,  we  assume  that  each  detector  is  coupled  with 
three  opposite  detectors  to  detect  (near)  coincidence  arrivals 
of  photons.  Thus  the  lines  sampled  by  each  detector  form  a 
divergent  pattern.  By  analogy  with  X-ray  CT,  we  refer  to  the 
collections  of  such  (divergent)  lines  as  a  projection  and  the  lines 
themselves  as  rays  in  the  projection.  Thus  in  Fig.  6(a)  we  have 
eight  projections  with  three  rays  per  projection  and  twelve 
rays  in  total  in  all  the  projections. 

To  simulate  measurements  by  a  PET  system,  SNARK09 
utilizes  many  of  the  routines  written  to  simulate  X-ray  CT 
measurements.  Consider  Fig.  6(b)  to  see  how  the  X-ray  CT 
organization  is  used  to  simulate  data  collection  for  the 
simplified  PET  geometry.  It  shows  a  schematic  of  a  divergent 
geometry  with  the  detectors  located  on  an  arc.  The  ray  sums 
are  measured  along  lines  joining  the  “source”  and  three 
opposite  detectors  located  on  an  arc  of  a  circle  whose  center 
is  at  the  source.  Fig.  6(b)  illustrates  the  situation  when  the 
“source”  is  at  location  D2  and  the  detectors  are  on  the  arc 
D'5DgD'7.  By  simple  geometrical  considerations  we  see  that, 
for  this  arc,  the  PINC  of  Section  2.2.2  can  be  selected  so  that 
the  line  connecting  the  “source”  to  a  detector  D'  in  the  CT 
geometry  goes  through  the  corresponding  detector  D  in  the 
PET  geometry  and  so  the  rays  sums  are  calculated  for  the  rays 
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Fig.  6  -  Schematic  of  geometry  of  data  collection  in  PET:  (a) 
Simplified  geometry  of  eight-detector  PET  system,  and  (b) 
SNARK09  divergent  geometry  used  to  simulate  the  PET 
geometry  shown  in  (a). 


whose  locations  are  the  correct  ones  for  the  PET  geometry 
of  Fig.  6(a).  In  Fig.  6(b)  a  full  scan  is  made  by  measuring  the 
ray  sums  as  the  source  rotates  through  locations  Dj-Ds.  The 
PET  data  simulation  is  completed  by  generating,  for  each  ray 
sum,  a  Poisson  random  variable  whose  mean  is  given  by  the 
value  of  the  ray  sum.  When  simulating  PET  data  collection, 
SNARK09  ignores  attenuation  effects. 
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2.3.  Built-in  reconstruction  algorithms 

The  SNARK09  package  comes  with  several  built-in  reconstruc¬ 
tion  algorithms.  It  also  provides  the  option  for  users  to  define 
their  own  reconstruction  algorithms  and  termination  tests.  In 
this  section  we  provide  a  very  brief  listing  of  the  reconstruc¬ 
tion  algorithms  that  are  available  in  SNARK09  together  with 
references  to  works  that  further  describe  those  algorithms  (a 
more  detailed  description  of  all  the  reconstruction  algorithms 
is  beyond  the  scope  of  this  publication).  The  references  are 
not  necessarily  to  the  sources  that  originated  the  method,  but 
rather  to  the  ones  that  provide  comprehensive  descriptions. 

The  reconstruction  methods  are  often  categorized  into 
two  groups:  transform  methods  and  series  expansion  meth¬ 
ods.  SNARK09  provides  several  algorithms  in  both  categories. 
The  transform  methods  provided  are  filtered  backprojection 
(FBP),  rho-filtered  layergram,  Fourier  method,  and  lino- 
grams.  The  series  expansion  methods  provided  are  algebraic 
reconstruction  techniques  (ART),  both  additive  and  mul¬ 
tiplicative,  simultaneous  iterative  reconstruction  technique 
(SIRT),  quadratic  optimization  methods,  and  a  maximum  a 
posteriori  probability  (MAP)  algorithm  for  PET  based  on  a  mod¬ 
ified  expectation-maximization  (EM)  algorithm  (referred  to  as 
EMAP).  The  series  expansion  methods  can  be  used  with  either 
pixels  or  blobs;  the  transform  methods  are  limited  to  only  pixel 
reconstructions. 

FBP  can  be  used  for  reconstruction  from  either  parallel  (see, 
e.g.,  [17,21])  or  divergent  rays  (see,  e.g.,  [22,17]).  The  standard 
backprojection  works  by  estimating  the  density  at  a  point 
by  adding  all  the  ray  sums  of  the  lines  through  that  point. 
FBP  filters  projection  data  before  it  is  used  in  the  backprojec¬ 
tion.  Several  different  types  of  filters  can  be  specified  for  this 
method. 

Rho-jiltered  layergram  (see,  e.g.,  [17,21,23])  is  a  reconstruction 
method  that  attempts  to  deblur  the  picture  that  is  obtained 
by  backprojection  alone.  SNARK09  provides  various  deblurring 
methods  that  can  be  used  with  this  algorithm. 

The  Fourier  method  (see,  e.g.,  [17,24])  is  based  on  the  pro¬ 
jection  theorem.  Roughly  speaking,  the  projection  data  are 
first  transformed  using  the  one-dimensional  Fourier  trans¬ 
form.  This  provides  values  of  the  two-dimensional  Fourier 
transform  of  the  picture  on  radial  lines.  From  these  values  the 
Fourier  transform  of  the  picture  is  estimated  at  the  centers  of 
the  pixels  of  a  grid,  and  the  discrete  inverse  two-dimensional 
Fourier  transform  is  used  to  get  the  reconstructed  picture. 

Lino  gram  is  another  method  based  on  the  projection  the¬ 
orem,  (see,  e.g.,  [17,25,26]).  Provided  that  the  projection  data 
are  collected  in  a  way  that  matches  certain  assumptions, 
the  linogram  algorithm  produces  reconstructions  faster  than 
FBP  and  the  quality  of  the  reconstructions  tends  to  be 
better. 

ART  (see,  e.g.,  [27,17])  is  a  family  of  iterative  algorithms  that, 
starting  from  an  initial  estimate  of  the  picture  to  be  recon¬ 
structed,  update  the  estimate  through  a  sequence  of  steps.  A 
single  step  is  influenced  by  exactly  one  ray  for  which  we  have 
an  estimate  of  the  ray  sum.  Only  those  basis  function  (pixel  or 
blob)  densities  that  contribute  to  the  associated  pseudo  ray 
sum  are  updated.  The  updating  is  done  by  the  addition  of 
a  correction  term  (additive  ART)  or  multiplication  by  a  cor¬ 
rection  term  (multiplicative  ART)  to  the  density  in  each  such 


basis  function,  so  that  after  the  correction  the  pseudo  ray  sum 
for  the  ray  in  question  will  be  nearer  to  the  ray  sum  in  the 
projection  data. 

SIRT  (see,  e.g.,  [28,29])  is  an  iterative  procedure  that,  start¬ 
ing  from  an  initial  estimate  of  the  picture  to  be  reconstructed, 
updates  the  estimate  through  a  sequence  of  steps.  Roughly 
speaking,  the  correction  at  each  update  is  the  discrete  back- 
projection  of  a  set  of  “projection  error  data”  that  consists  of  all 
the  differences  between  the  given  ray  sums  and  corresponding 
pseudo  ray  sums  from  the  current  estimate  of  the  image. 

Quadratic  optimization  techniques  are  a  family  of  algorithms 
(see,  e.g.,  [17,30,31])  that  minimize  a  quadratic  function  of  the 
vector  of  basis  function  densities  using  an  iterative  process. 
There  are  several  choices  available  for  the  quadratic  function 
to  be  minimized  and  the  minimization  method  to  be  used. 

EMAP  is  a  maximum  a  posteriori  probability  (MAP)  algorithm 
for  PET  based  on  a  modified  expectation-maximization  (EM) 
algorithm  (see  [32,33]). 

In  addition  to  the  built-in  algorithms,  the  users  can  add  up 
to  ten  user-defined  reconstruction  algorithms  to  SNARK09. 

2.4.  Evaluation 

For  single  reconstructions,  SNARK09  provides  means  for  the 
evaluation  of  some  quantitative  measures  of  the  overall  differ¬ 
ence  between  a  digitized  test  phantom  and  its  reconstruction. 
Such  an  evaluation  can  be  performed  either  over  the  entire 
region  of  the  image  or  over  selected  areas,  and  can  be  also 
restricted  to  pixels  whose  densities  fall  within  a  user-selected 
range. 

2.5.  Experimenter 

It  is  often  desirable  to  evaluate  the  relative  efficacy  of  two  or 
more  reconstruction  methods  for  a  specific  medical  task  in 
a  manner  that  is  statistically  sound  [34-37].  Such  an  evalua¬ 
tion  must  be  done  using  a  sample  set  that  is  large  enough  to 
provide  a  statistically  significant  result.  Performing  this  evalu¬ 
ation  on  mathematical  phantoms  requires  a  means  of  running 
the  competing  algorithms  on  projection  data  obtained  from 
a  large  number  of  randomly  generated  phantoms.  Thereafter, 
various  numerical  measures  of  agreement  between  the  recon¬ 
structed  images  and  the  original  phantoms  may  be  used  to 
reach  a  conclusion  that  has  some  statistical  substance.  A 
straightforward  way  of  achieving  this  goal  is  to  provide  a 
front-end  or  driver  program  that  contains  all  the  requisite 
commands  that  may  be  fed  to  SNARK09  to  generate  as  many 
phantoms  as  needed  together  with  their  projection  data,  to 
implement  the  desired  reconstruction  algorithms  on  these 
data,  and  to  evaluate  the  reconstructed  images.  Such  a  driver 
program  is  provided  by  SNARK09  in  form  of  the  Experimenter 
module.  The  method  used  in  the  comparative  evaluation  of 
the  algorithms  consists  of  the  following: 

•  generation  of  random  samples  from  a  statistically  described 
ensemble  of  phantoms  and  their  projection  data; 

•  reconstruction  from  the  projection  data  by  each  of  the  algo¬ 
rithms  to  be  compared; 
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•  assignment  to  each  reconstructed  image  a  figure  of  merit 
(FOM),  which  measures  the  appropriateness  of  the  image 
for  solving  the  specified  task; 

•  calculation  of  the  statistical  significance,  based  on  the  FOMs 
for  all  reconstructions,  at  which  we  can  reject  the  null 
hypothesis  that  the  methods  are  equally  helpful  for  solv¬ 
ing  the  task  in  favor  of  the  alternative  hypothesis  that  the 
one  with  the  higher  average  FOM  is  more  helpful. 

The  ensemble  of  phantoms  available  for  multiple  runs  of 
SNARK09  within  the  Experimenter  module  has  several  pos¬ 
sible  sources  of  randomness: 

(1)  Users  may  specify  a  list  of  multiple  phantom  descriptions 
that  are  chosen  at  random  during  the  experiment. 

(2)  For  a  fixed  pair  of  distinct  density  values,  paired  structures, 
which  are  elemental  objects  that  appear  symmetrically 


The  hit-ratio  FOM  [35,37]  is  calculated  only  for  those  phan¬ 
toms  containing  paired  structures  (such  paired  structures 
have  unequal  densities).  For  such  pairs  a  hit  occurs  if  the  struc¬ 
ture  in  the  pair  with  the  higher  average  density  in  the  phantom 
is  also  the  structure  in  the  pair  with  the  higher  average  den¬ 
sity  in  the  reconstruction.  The  hit-ratio  for  a  reconstruction  is 
the  number  of  hits  divided  by  the  total  number  of  pairs. 

The  imagewise  region  of  interest  (IROI)  FOM  [6]  is  calculated 
only  for  phantoms  that  contain  paired  structures.  Such  paired 
structures  must  have  unequal  densities  with  one  of  the  struc¬ 
tures  having  non-zero  density  (we  refer  to  it  as  the  tumor )  and 
the  other  having  density  zero.  The  pairs  of  structures  are  num¬ 
bered  from  1  to  B.  For  l<b<B,  let  crf(b)  (respectively,  ajj(b)) 
denote  the  average  density  in  the  phantom  of  the  structure  of 
the  bth  pair  that  is  (respectively,  is  not)  the  tumor.  We  specify 
similarly  a[{b)  (respectively,  cern(b)),  for  the  reconstruction.  The 
imagewise  region  of  interest  FOM  is  defined  by 
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with  respect  to  the  vertical  line  through  the  center  of  a 

phantom,  can  be  assigned  densities  in  such  a  way  that 
in  each  pair  exactly  one  structure  has  one  of  the  two  fixed 
density  values.  This  assignment  is  done  in  a  random  man¬ 
ner  at  the  time  of  phantom  generation.  Given  s  paired 
structures,  there  are  2s  possible  phantoms,  assuming  that 
the  paired  structures  are  the  only  source  of  variability  in 
the  ensemble. 

(3)  Random  inhomogeneity  can  be  added  to  the  pixel  densi¬ 
ties  each  time  a  new  phantom  is  generated. 

(4)  Noise  in  the  projection  data  may  be  generated  at  random 
each  time  a  projection  dataset  is  generated. 


The  first  thing  to  note  about  this  formula  is  that  the  numer¬ 
ator  and  the  denominator  in  the  big  fraction  are  exactly  the 
same  except  that  the  numerator  refers  to  the  reconstruc¬ 
tion  and  the  denominator  refers  to  the  phantom.  Thus,  if  the 
reconstruction  is  perfect  (in  the  sense  of  being  identical  to  the 
phantom)  then  IROI  =  1.  Analyzing  the  contents  of  the  numer¬ 
ator  and  the  denominator,  we  see  that  they  are  (except  for 
constants  that  cancel  out)  the  mean  difference  between  the 
average  values  at  the  tumor  site  and  the  corresponding  non¬ 
tumor  site  divided  by  the  standard  deviation  of  the  average 
values  at  the  non-tumor  sites.  It  has  been  found  by  experi¬ 
ments  with  human  observers  that  this  FOM  correlates  well 
with  the  performance  of  people  [6], 


SNARK09  provides  several  built-in  FOMs.  It  also  allows  users  to 
create  new  FOMs  that  are  more  appropriate  for  a  task  at  hand. 
Below  we  provide  a  brief  description  of  FOMs  that  are  built 
into  SNARK09  together  with  references  to  works  that  further 
describe  them. 

The  structural  accuracy  FOM  [35,37]  is  computed  as  follows. 
Consider  a  phantom  that  contains  a  total  of  N  structures.  For 
a  reconstruction,  let  ark  be  the  average  pixel  value  for  those 
pixels  whose  centers  are  within  the  structure  k.  Let  a £  be  the 
average  pixel  value  of  the  corresponding  structure  in  the  phan¬ 
tom.  The  structural  accuracy  of  a  reconstruction  is  defined  as 

N 

(9) 

fe=i 

The  pointwise  accuracy  FOM  [35,37]  is  defined  as  the  nega¬ 
tive  of  the  normalized  root  mean  square  distance  between  a 
reconstruction  and  the  phantom.  It  is  sometimes  desirable  to 
compute  the  pointwise  accuracy  when  both  the  phantom  and 
the  reconstruction  are  clipped  to  a  specified  density  range. 


3.  System  description 

In  this  section  we  discuss  the  structure  of  the  SNARK09 
package,  its  various  modules  and  the  graphical  user  interfaces. 

3.1.  Application  framework 

A  SNARK09  run  can  be  subdivided  into  three  phases:  (1)  data 
generation,  (2)  initialization  and  reconstruction,  and  (3)  anal¬ 
ysis. 

Each  of  these  phases  requires  some  input  data  and 
produces  some  output  data.  Some  of  the  output  is  used  as 
the  input  of  a  later  (or  even  the  same)  phase.  Provided  that 
the  appropriate  input  data  are  available,  a  single  SNARK09  run 
may  consist  of  one,  two  or  all  three  of  these  phases. 

We  now  proceed  with  a  description  of  each  of  the  three 
phases.  The  reader  should  consult  Fig.  7  for  an  overview.  The 
specific  details  of  mandatory  and  optional  parts  of  the  input 
are  discussed  in  the  online  manual  [2],  The  reader  should  be 
aware  that  the  word  “input”  is  used  for  both  the  stream  of 
commands  that  drive  a  whole  SNARK09  run,  but  also  to  what 
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SNARK09  input  file 


SNARK09  output  files 


Fig.  7  -  Data  flow  in  SNARK09. 


is  considered  to  be  the  input  data  to  any  of  the  phases  of  such 
a  run. 

3.1.1.  Data  generation  phase 

During  this  phase  SNARK09  generates  a  phantom  and  projec¬ 
tion  data  of  it.  The  projection  data  consist  of  real  ray  sums 
of  the  phantom,  possibly  contaminated  by  the  types  of  noise 
that  one  may  come  across  in  a  device  used  for  collecting  data 
for  reconstruction. 

Input:  The  input  for  this  phase  consists  of  (1)  the  geomet¬ 
rical  description  of  the  phantom,  and  (2)  description  of  the 
projections  including  geometry  of  data  collection,  number  and 
distribution  of  projections  and  noise  present  during  the  data 
collection  process. 

Output:  The  output  of  this  phase  consists  of  (1)  a  copy  of 
the  geometrical  description  of  the  phantom,  (2)  a  pixel  by 
pixel  description  of  the  phantom  for  one  or  more  energy  levels 
(according  to  specifications  provided  in  the  input),  (3)  a  copy 
of  the  geometry  of  data  collection,  number  and  distribution 
of  projections  and  noise  present  during  the  data  collection 
process,  and  (4)  a  ray  by  ray  description  of  each  projection. 
The  output  is  self  contained  in  the  sense  that  it  contains 
all  the  information  provided  by  the  input  together  with  the 
newly  generated  data.  This  is  used  as  input  for  the  subsequent 
phases  either  in  the  same  run  of  SNARK09  or  in  separate  runs, 
in  which  the  presence  of  the  original  input  data  is  not  required. 

3.1.2.  Initialization  and  reconstruction  phase 

The  main  goal  of  this  phase  is  to  perform  the  reconstruc¬ 
tion  based  on  the  available  projection  data.  This  phase  can 
be  further  divided  into  two  sub-phases:  (1)  initialization  and 
(2)  reconstruction.  The  latter  cannot  be  performed  without  the 
former  having  been  performed  in  the  same  run. 

During  the  initialization  sub-phase,  the  nature  of  the  grid 
for  the  reconstruction  region  as  well  as  the  assumed  geometry 


of  data  collection  are  determined.  The  phantom  and  projec¬ 
tion  data  are  written  as  output  files  in  the  XML  (extensible 
markup  language)  format,  which  is  suitable  for  easy  access  in 
subsequent  processing  steps  and  for  visualization.  During  the 
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Fig.  8  -  SNARK09Input  is  a  graphical  user  interface  used  for 
creation  of  SNARK09  input  files. 
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reconstruction  sub-phase  the  reconstruction  algorithms  are 
carried  out  and  the  results  are  saved. 

Input:  The  first  source  of  input  data  is  the  main  input  file  for 
the  SNARK09  run.  It  contains  (1)  the  description  of  the  recon¬ 
struction  region,  and  (2)  the  list  of  algorithms  to  be  used  for 
the  reconstruction.  The  second  input  file  is  simply  the  output 
file  produced  in  the  previous  phase.  This  file  does  not  have  to 
be  created  in  the  same  run  of  SNARK09.  In  fact,  the  second  file 
can  be  manually  produced  and  filled  by  the  data  obtained  by 
a  real  imaging  device. 

Output:  There  are  two  XML  output  files  produced  by  this 
phase.  The  first  one  contains  just  a  copy  of  the  projection  data. 
It  is  written  in  the  format  used  in  the  subsequent  phases  and 
for  visualization.  The  second  file  contains  a  copy  of  the  phan¬ 
tom  (if  it  is  available)  and  of  all  the  reconstructions  in  the 
current  run  of  SNARK09.  If  iterative  reconstruction  methods 
are  used,  then  the  reconstructions  produced  by  each  of  the 
iterative  steps  are  saved. 

3.1.3.  Analysis  phase 

In  the  final  phase,  the  data  obtained  by  the  reconstruction 
algorithms  can  be  further  processed  and  analyzed.  Depend¬ 
ing  on  the  commands  in  the  input  file  there  are  several  things 
that  may  be  achieved  here:  (1)  statistical  analysis  of  the  results, 
(2)  comparison  of  reconstruction(s)  with  a  phantom,  (3)  stor¬ 
ing  of  a  reconstruction  in  a  format  that  allows  for  its  later  use 
as  a  starting  point  for  another  reconstruction  algorithm,  and 
(4)  saving  of  the  reconstructions  in  a  standard  image  format. 


Input:  The  first  source  of  input  data  comes  from  the  main 
input  file  for  the  SNARK09  run.  It  contains  commands  that 
indicate  what  needs  to  be  computed  and  written  in  what  for¬ 
mat.  The  second  input  file  is  the  data  file  produced  by  the 
previous  phase  (in  the  same  or  a  separate  run)  that  contains 
all  the  reconstructions  and  the  original  phantom  (if  there  is 
one). 

Output:  The  output  files  depend  on  what  the  user  specified 
in  the  original  input  file.  They  can  be  text  files  with  results  of 
statistical  analysis  and  image  files  with  requested  graphics. 

3.2.  DIG  libraries 

The  DIG  libraries  are  used  in  the  creation  of  the  projection 
and  reconstruction  data  files  produced  in  SNARK09  runs.  They 
provide  routines  that  can  be  used  easily  to  access,  extract  and 
modify  the  data  stored  in  those  data  files.  Programmers  who 
need  to  process  further  the  projection  and  reconstruction  data 
should  use  these  libraries  to  obtain  easy  and  safe  access  to 
them. 

3.3.  Graphical  user  interfaces 

To  ease  the  use  of  SNARK09,  two  interactive  graphical  user 
interfaces  have  been  designed  for  creating  input  files  and  for 
visualization  of  projection  data,  reconstructions  and  the  anal¬ 
ysis  of  the  results.  SNARK09Input  assists  users  in  the  creation 
of  input  files  used  in  SNARK09;  see  Fig.  8.  SNARK09Display 
allows  users  to  display  the  outputs  of  a  SNARK09  run;  see 
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Fig.  9  -  SNARK09Display  is  a  graphical  user  interface  used  for  display  of  results  obtained  by  a  SNARK09  run. 
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Fig.  9.  It  can  display  2D  images  of  projection  data  and  of  recons¬ 
tructions  at  user- defined  gray-level  intensity  values  and  plot 
their  row/column  profiles.  It  also  can  display  graphically  data 
analysis  results.  The  images  presented  in  the  next  section  for 
an  example  run  of  SNARK09  have  all  been  generated  using 
SNARK09Display. 


4.  Example  of  use 

In  this  section  we  illustrate  in  detail  an  example  of  how 
SNARK09  can  be  used  in  practice.  We  present  multiple 


I  X  I 
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features  of  the  package,  but  it  is  impossible  to  make  use  of 
all  features  in  a  single  example.  The  reader  is  referred  to  the 
SNARK09  manual  [2]  for  the  detailed  listing  of  all  the  available 
features  and  for  many  more  examples  of  its  use.  There  are 
ten  worked  out  examples  in  the  manual  with  input  files  and 
output  generated  by  SNARK09;  these  examples  are  provided 
also  when  SNARK09  package  is  downloaded  from  its  website. 
The  book  [17]  used  SNARK09  and  the  phantom  from  Fig.  3b 
for  demonstration  of  many  concepts  related  to  computerized 
tomography. 

In  the  example  reported  here,  we  evaluate  the  usefulness 
of  two  reconstruction  algorithms  for  recovery  of  low-contrast 


Fig.  10  -  (a  and  b)  Thorax  phantom  with  randomly  generated  small  low-contrast  tumors  in  the  lungs  and  tissue 
inhomogeneity,  (c  and  d)  ART  reconstruction  using  pixel  basis  functions,  (e  and  f)  ART  reconstruction  using  blob  basis 
functions.  In  all  cases,  the  image  on  the  left  uses  the  full  display  window  of  grayscale  values  while  the  image  on  the  right 
uses  a  narrow  display  window  for  better  visualization  of  the  low-contrast  tumors. 
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tumors  in  lung  tissue.  The  Experimenter  part  of  SNARK09 
allows  us  to  do  such  an  evaluation.  We  need  to  choose  an 
ensemble  of  phantoms,  one  or  more  figures  of  merit  (FOMs), 
and  two  or  more  algorithms  whose  performance  is  being  com¬ 
pared. 

Consider  the  thorax  phantom  shown  in  Fig.  2(b).  We  mod¬ 
ified  the  phantom  by  adding  a  list  of  20  pairs  of  possible 
tumor  sites  in  the  lung.  The  tumors  are  represented  by 
small  circles  with  linear  attenuation  coefficients  15%  higher 
than  the  underlying  lung  tissue.  Furthermore,  we  added 
inhomogeneity  to  the  phantom  to  represent  the  biological  tis¬ 
sue  more  accurately.  This  is  done  in  SNARK09  by  adding  to 
each  pixel  a  random  value  from  a  zero  mean  Gaussian  distri¬ 
bution  with  a  specified  standard  deviation.  In  this  case,  we 
used  a  standard  deviation  of  6%  of  the  underlying  density. 
The  inhomogeneity  of  the  tissue  lowers  significantly  the  dif¬ 
ference  in  density  between  the  lung  tissue  and  the  tumors, 
making  it  more  challenging  for  the  algorithms  to  recover  the 
tumors  correctly.  During  the  experiment,  for  each  pair  of  pos¬ 
sible  tumor  sites,  the  tumor  is  randomly  placed  either  in  the 
left  or  right  lung,  giving  us  220  possible  phantoms  even  before 
the  inhomogeneity  is  added.  The  contrast  between  the  tumors 
and  lung  tissue  is  so  low,  that  when  the  phantom  is  dis¬ 
played  using  the  full  range  of  attenuation  coefficients  mapped 
to  grayscale  values,  the  tumors  are  practically  invisible,  see 
Fig.  10(a).  This  is  due  to  much  higher  attenuation  coefficients 
for  bone  and  muscle  tissue  as  compared  to  the  lung.  The  tumor 
sites  become  visible  when  the  display  window  of  gray  values 
is  narrowed,  see  Fig.  10(b). 

We  simulated  polychromatic  X-rays.  To  do  so,  we  used 
five  discrete  energy  levels.  The  attenuation  coefficients  for 
five  different  energy  levels  are  listed  in  Table  2.  The  atten¬ 
uation  coefficients  for  energy  of  60  keV  correspond  to  the 
ones  in  the  phantom  description  in  Table  1.  The  projection 
data  were  obtained  from  360  angles  equally  spaced  in  the 
range  [0 — 360°)  using  divergent  rays.  Each  projection  con¬ 
tained  363  rays.  The  collected  data  were  corrupted  by  quantum 
and  scatter  noise.  The  data  were  corrected  for  beam  hardening 
(due  to  polychromaticity  of  the  X-ray  beam)  before  it  was  used 
for  reconstruction.  The  projections  for  one  of  the  randomly 
generated  phantoms  are  shown  as  columns  of  the  image  in 
Fig.  11. 

We  compared  the  two  variants  of  the  built-in  ART  algo¬ 
rithm:  one  using  pixels,  the  other  using  blobs.  We  used  a 
built-in  FOM:  imagewise  region  of  interest  (IROI).  The  IROI FOM 
has  been  confirmed  to  correlate  well  with  human  observers  for 
detectability  of  small,  low  density  features  [6],  The  number  of 
FOMs  computed  for  each  experiment  is  up  to  the  user. 

Using  SNARK09  Experimenter,  both  versions  of  ART  were 
automatically  run  thirty  times,  each  time  generating  a  new 
phantom  and  a  new  set  of  projection  data  based  on  which 
reconstructions  were  computed.  The  reconstructions  com¬ 
puted  in  one  such  run  are  shown  in  Fig.  10(c)— (f).  When  viewed 
in  the  full  window  of  grayscale  values  (see  Fig.  10(c)  and  (e)) 
the  two  reconstructions  are  almost  indistinguishable.  The  dif¬ 
ferences  appear  only  when  the  images  are  viewed  using  a 
much  narrower  display  window  (see  Fig.  10(d)  and  (f)).  After  all 
runs  completed,  statistical  significance  was  computed  using 
the  FOM  values  for  each  reconstruction  algorithm.  The  results 
of  this  statistical  analysis  are  presented  in  Table  3.  The  table 


Fig.  11  -  Projection  dataset  obtained  based  on  the  thorax 
phantom  shown  in  Fig.  10(a).  Each  column  of  pixels  in  the 
image  corresponds  to  a  single  projection. 


shows  average  values  of  the  figure  of  merit  computed  for 
different  iterations  of  ART.  For  ART  using  pixels  the  highest 
average  FOM  was  obtained  at  the  eighth  iteration.  For  ART 
using  blobs  the  highest  average  FOM  was  obtained  at  the  fifth 
iteration.  The  last  column  in  Table  3  shows  that  the  differences 
are  statistically  significant.  Thus,  according  to  the  values  of 
the  IROI  FOM,  we  can  reject  the  null  hypothesis  that  the  two 
variants  of  ART  perform  equally  well  for  detection  of  small 
low-contrast  tumors  in  favor  of  the  alternative  hypothesis  that 
ART  using  blobs  performs  better. 

The  reconstruction  obtained  using  ART  with  blobs 
smoothes  the  inhomogeneities  in  the  lungs,  resulting  in  an 
increased  contrast  between  the  tumors  and  their  background. 
This  can  be  seen  in  the  plots  of  the  density  values  along  col¬ 
umn  191  of  the  reconstructions  and  the  phantom  shown  in 
Fig.  12.  In  fact,  the  lung  in  the  reconstruction  using  blobs 
appears  to  be  smoother  than  in  the  phantom;  which  results  in 
the  IROI  FOM  having  a  value  greater  than  one.  The  claims  of 
superiority  of  one  algorithm  over  another  can  be  made  only  for 
the  FOM  that  measures  the  task  at  hand.  From  Fig.  12  it  is  clear 
that  the  variability  in  the  lung  tissue  was  not  recovered  well  by 
the  ART  with  blobs.  But  recovery  of  the  background  variability 
was  not  the  task  that  was  evaluated;  the  task  was  to  detect 
small  low-contrast  tumors  and,  by  smoothing  the  reconstruc¬ 
tion,  ART  with  blobs  generated  reconstructions  with  small 
tumors  that  are  visible  more  clearly  than  in  the  reconstruct¬ 
ions  produced  by  the  ART  with  pixels. 


5.  Availability  and  system  requirements 

SNARK09,  SNARK09Input  and  SNARK09Display  are  all  open 
source.  They  are  available  for  download  on  the  SNARK09  web¬ 
site  [1]  free  of  charge. 
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Table  2  -  Linear  attenuation  coefficients  (in  cm  a)  as  a  function  of  photon  energy  for  tissues  that  occur  in  the  thorax 
phantom. 


Energy  (keV) 

Muscle 

Blood 

Fat 

Lung 

Compact  bone 

Soft  bone 

■Rimors 

40 

0.249 

0.278 

0.224 

0.062 

0.642 

0.520 

0.071 

50 

0.214 

0.234 

0.198 

0.055 

0.455 

0.382 

0.063 

60 

0.196 

0.214 

0.184 

0.049 

0.371 

0.318 

0.056 

80 

0.178 

0.189 

0.170 

0.047 

0.298 

0.261 

0.054 

100 

0.167 

0.176 

0.161 

0.042 

0.265 

0.235 

0.048 

Cross  section  through  phantom  and  pixel  reconstruction  ::  Column  191 


(a) 


Cross  section  through  phantom  and  blob  reconstruction  ::  Column  191 


(b) 


Fig.  12  -  Comparison  of  the  plots  of  the  density  values  along  column  191  through  the  two  reconstructions  (in  black)  and  the 
phantom  (in  gray):  (a)  ART  with  pixels  and  (b)  ART  with  blobs. 
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Table  3  -  Statistical  analysis  results  computed  by  SNARK09  experimenter  for  the  example  in  Section  4.  The  statistical 
significance  of  the  observed  differences  between  the  performance  of  the  pixel  and  blob  algorithms  (as  measured  by  the 
IROI  FOM)  was  calculated  and  is  given  in  the  last  row. 

FOM:  Imagewise-ROI 

ART  with  pixels 

ART  with  blobs 

Significance  level 

Iteration 

Mean 

Iteration 

Mean 

1 

0.2438 

1 

0.8386 

0.00000005 

3 

0.7339 

3 

1.0320 

0.00000906 

5 

0.8557 

5 

1.1144 

0.00002006 

8 

0.8659 

8 

1.1043 

0.00001781 

10 

0.8497 

10 

1.0957 

0.00001194 

8 

0.8659 

5 

1.1144 

0.00001688 

The  package  is  a  Linux/Unix  based  system.  It  runs  on 
a  typical  modern  PC  and  has  no  specific  hardware  require¬ 
ments.  The  software  libraries  used  by  SNARK09  are  provided 
in  repositories  of  all  the  major  Linux  distributions.  SNARK09  is 
implemented  in  C/C++,  which  are  available  on  a  wide  variety 
of  hardware  and  operating  system  platforms  and  are  currently 
among  the  most  popular  programming  languages  used  by 
computer  scientists.  The  standard  development  packages  that 
come  with  a  typical  Linux  distribution  are  sufficient  to  compile 
and  build  the  package. 


6.  Future  work 

SNARK09  is  a  package  that  is  the  result  of  more  than  three 
decades  of  continuous  development.  It  evolves  as  the  field  of 
tomographic  reconstruction  changes.  There  are  many  aspects 
of  it  that  can  be  modified  and  expanded.  We  plan  to  rewrite 
some  of  the  existing  code  to  make  it  computationally  more 
efficient  and  to  take  advantage  of  some  of  the  multiprocessing 
hardware  (multicore  processors  and/or  graphics  processing 
units)  that  have  become,  in  recent  years,  standard  on  a  typ¬ 
ical  desktop  computer.  We  also  plan  to  incorporate  into  the 
standard  SNARK09  code  new  reconstruction  algorithms  that 
we  are  currently  using  as  user-defined  routines.  An  example 
of  this  is  the  recently  developed  superiorization  methodol¬ 
ogy  for  image  reconstruction,  that  has  been  implemented 
and  thoroughly  investigated  within  SNARK09  via  user-defined 
routines;  see,  e.g.,  [38,39], 
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Abstract — Proton  radiography  generates  two-dimensional  pro¬ 
jection  images  of  an  object  and  has  applications  in  patient 
alignment  and  verification  procedures  for  proton  beam  radiation 
therapy.  The  quality  of  the  image,  both  contrast  and  spatial 
resolution,  is  affected  by  the  energy  of  the  protons  used  in  the 
creation  of  the  radiograph,  as  well  as  by  multiple  Coulomb  scat¬ 
tering  and  energy-loss  straggling.  Here  we  report  an  experiment 
which  used  200  MeV  protons  to  generate  proton  energy-loss  and 
scattering  radiographs  of  a  hand  phantom.  It  was  found  that 
while  both  radiographs  displayed  anatomical  details  of  the  hand 
phantom,  the  energy-loss  radiograph  has  a  noticeably  higher 
spatial  resolution.  The  scattering  radiograph  may  yield  sharper 
edges  between  soft  and  bone  tissue  than  energy  loss  radiograph, 
but  this  requires  further  study.  These  radiographs  demonstrate 
the  new  promise  of  proton  imaging  (proton  radiography  and 
CT)  now  within  reach  of  becoming  a  new,  potentially  low-dose 
medical  imaging  modality.  The  experiment  used  the  current  first- 
generation  proton  CT  scanner  prototype,  which  is  installed  on 
the  research  beam  line  of  the  clinical  proton  synchrotron  at  Loma 
Linda  University  Medical  Center.  This  study  contributes  to  the 
optimization  of  the  performance  of  a  clinical  proton  CT  scanner. 

Index  Terms — proton  imaging,  tomographic  reconstruction  of 
material  properties,  spatial  resolution,  data  reduction 


I.  Introduction 

With  increasing  use  of  proton  radiation  therapy  for  cancer 
patients,  research  into  new  imaging  methods  that  can  improve 
the  accuracy  of  proton  range  estimates  in  radiation  therapy 
planning  have  become  a  high  priority.  Protons  are  particularly 
desirable  for  treating  cancerous  tissue  in  close  proximity  to 
radiosensitive  normal  tissues,  such  as  at  the  base  of  skull 
and  near  the  spinal  cord.  Protons  are  preferable  to  photons 
because  their  energies  are  easily  tuned,  the  unhealthy  area 
can  be  isolated,  and  the  dose  can  be  localized  reducing  the 
threat  of  damaging  otherwise  healthy  tissue.  Most  importantly, 
the  greatest  radiation  dose  occurs  only  in  the  last  2%  of  the 
proton’s  range,  at  the  Bragg  peak,  so  a  maximum  amount  of 
healthy  tissue  can  be  spared  when  the  position  of  the  Bragg 
peak  is  controlled. 
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In  order  to  obtain  relative  stopping  power  (RSP),  Hounsfield 
units  (i.e.  units  of  x-ray  attenuation  used  in  x-ray  CT)  are  trans¬ 
formed  using  a  calibration  curve.  However,  there  is  no  unique 
relationship  between  Hounsfield  units  and  RSP,  especially  in 
the  regime  of  RSP=1  (i.e.  water,  human  tissue).  This  means 
that  during  conversion,  errors  in  proton  range  are  consistently 
3-4%  of  the  nominal  proton  range  or  even  higher  in  regions 
containing  bone  [1],  A  recent  survey  by  the  American  Asso¬ 
ciation  of  Physicists  in  Medicine  (AAPM)  showed  that  33% 
of  attendees  polled  said  that  range  uncertainties  are  the  main 
obstacle  to  making  proton  therapy  mainstream  [2],  Simulations 
and  first  experimental  results  have  shown  that  using  a  proton 
CT  imaging  system  one  may  be  able  to  reduce  this  range 
uncertainty  to  about  1%  or  less  without  increasing  the  dose  to 
the  patient. 

Proton  CT  differs  in  several  key  aspects  from  x-ray  CT. 
While  unscattered  photons  travel  in  straight  line  paths,  protons 
do  not  and  rather  undergo  many  multiple  Coulomb  scattering 
(MCS)  events,  which  limits  the  usefulness  of  the  standard 
filtered  back  projection  (FBP)  approach  to  reconstruction. 
In  fact,  proton  CT  images  reconstructed  with  the  classical 
FBP  algorithm  suffer  from  loss  of  spatial  resolution  since 
the  proton  path  deviates  from  the  assumed  straight  lines  by 
up  to  several  millimeters  in  anatomical  objects  encountered 
in  medical  proton  CT  imaging.  The  accuracy  of  those  path 
estimates  is  critical  for  achieving  a  high  spatial  resolution  in 
proton  CT. 

A.  Current  Prototype  Design 

A  low  intensity,  high  energy  (100-200  MeV)  cone  beam  of 
protons  traverses  a  phantom.  Silicon  strip  detectors  (228  //m 
pitch)  record  the  proton  path  in  4  planes  (each  400  um  thick) 
so  entry  and  exit  vectors  can  be  easily  determined.  Detec¬ 
tors  interface  through  a  high  speed  field  programmable  gate 
array  (FPGA)-based  data  acquisition  system.  A  calorimeter 
composed  of  an  array  of  18  Csl  crystals  is  used  to  detect  the 
residual  energies  of  incident  protons  at  a  rate  of  up  to  100k 
protons/sec. 

B.  Reconstruction  Software 

Mathematical  algorithms  and  computer  software  are  used  to 
reconstruct  the  phantom  from  raw  data  [3].  Raw  data  contain 
the  proton  tracker  coordinates  and  the  calorimeter’s  response 
for  each  proton.  The  software  bins  the  exit  tracker  data  into 
spatial  bins  (pixels)  and  determines  cuts  in  relative  angle, 
defined  as  the  difference  between  entry  and  exit  angle,  at  3er 


Fig.  1  —  First  radiograph  of  a 
hand  phantom  with  0.5  mm 
pixels  (scale  in  cm  of  WEPL). 
The  RSP  of  bone  is  only  about 
50%  greater  than  that  of  water, 
resulting  in  the  low  contrast 
between  the  bones  and  soft  tis¬ 
sue.  The  line  traversing  the  im¬ 
age  corresponds  to  the  image 
profile  analyzed  in  Fig.  4. 
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from  each  pixel’s  mean  relative  scattering  angle.  These  cuts  are 
made  to  exclude  events  that  have  very  large  scattering  angles, 
caused  by  inelastic  nuclear  interactions  or  elastic  large  angle 
scattering  events  inside  the  phantom.  The  software  also  makes 
cuts  in  water  equivalent  path  length  (WEPL)  given  by: 

L=  f  edi,  (1) 

Ji 

where  g  is  the  ratio  of  the  stopping  power  of  the  material  to 
the  stopping  power  of  water  (i.e.  the  RSP)  and  l  defines  the 
path  of  the  proton.  These  cuts  are  also  made  at  3er  from  the 
mean  pixel  value,  and  are  necessary  to  insure  that  erroneously 
large  energy  measurements,  caused  by  the  coincidence  of  two 
or  more  particles  in  the  calorimeter,  are  excluded. 

II.  Energy-Loss  Radiography  and  Water 
Equivalent  Path  Length  (WEPL) 

The  quantity  of  importance  for  proton  treatment  planning  is 
relative  stopping  power  (RSP)  of  protons  with  respect  to  water. 
RSP,  or  g  in  Eq.  1,  is  practically  energy  independent  and  is 
determined  mostly  by  the  electron  density  of  the  material  or 
tissue. 

We  calibrate  the  calorimeter  response  to  the  integral  of  the 
RSP  directly.  For  each  pixel,  we  define  a  mode  window  of 
WEPL  that  accepts  protons  within  ±  30%  of  the  mode,  or 
±1  cm  if  30%  is  less  than  1  cm,  and  make  the  appropriate 
cuts  during  reconstruction.  Fig.  1  is  a  radiograph  of  a  hand 
phantom  using  this  energy-loss  technique  and  data  reduction 
process. 

The  WEPL  distribution  of  protons  in  each  pixel  is  roughly 
gaussian,  as  seen  in  Fig.  3(a).  The  distribution  is  usually 
skewed  to  the  right  (high  WEPL)  which  corresponds  to  the 
left-skewed  (low-energy)  distributions  in  energy.  The  protons 
in  the  tails  are  protons  that  underwent  nuclear  scattering 
events.  These  are  the  events  that  we  wish  to  reduce  by 
appropriate  cuts. 


Fig.  2  -  Radiograph  of  a  hand  phantom  (Fig.  1)  in 
terms  of  water  equivalent  thickness  (WET)  calculated 
from  the  summed-up  stopping  power  of  the  phantom.  The 
image  shows  the  varying  thickness  of  the  hand  and  clear 
structural  details.  The  scale  on  the  right  hand  side  is  in 
cm. 


We  did  find  that  a  significant  percentage  of  pixels  contained 
non-gaussian,  or  anomolous  WEPL  distributions.  These  distri¬ 
butions,  as  in  3(b),  are  bimodal  and  correspond  to  pixels  that 
lie  on  the  boundary  between  two  materials  of  different  RSP 
Currently,  the  reconstruction  algorithm  selects  the  mode  that  is 
closest  to  the  mean,  and  the  appropriate  cuts  are  determined 
based  on  that  value.  This,  however,  ignores  valuable  infor¬ 
mation  and  leads  to  lower  spatial  resolution.  Methods  such 
as  averaging  the  two  modes,  or  “splitting”  pixels  have  been 
proposed  and  have  yet  to  be  explored. 

An  image  of  the  radiographic  hand  phantom  in  terms  of 
WEPL  (Fig.  1  and  2)  was  created  by  plotting  values  of  WEPL 
for  each  pixel  (in  cm).  The  image  clearly  depicts  the  varying 
thickness  of  the  hand  in  different  places,  and  shows  clear 
structural  details.  The  agreement  between  this  image  and  the 
phantom  shows  that  there  is  great  promise  in  our  technique. 

As  a  further  exploration  of  WEPL,  we  investigated  ra¬ 
diographs  of  various  pixel  sizes:  1-mm,  0.5-mm,  0.25-mm. 
The  plots  in  Fig.  4  illustrate  the  image  profile  along  the  line 
indicated  in  Fig.  1  for  the  various  pixel  sizes.  Fig.  4  shows 
that  as  pixel  size  is  systematically  decreased,  the  steepness 
of  the  slope  of  the  image  profile  increases  from  a  relatively 
shallow  incline  in  the  1-mm  (pixel  size)  plot  to  a  steep  rise 
from  0  to  1  cm  of  WEPL  in  the  0.25-mm  plot,  due  to  the 
improved  spatial  resolution  with  smaller  pixel  size.  However, 
decreasing  the  size  of  the  pixel  also  increases  the  amount  of 
spatial  noise  added  to  the  profile,  due  to  the  lower  statistics 
(fewer  protons  in  each  pixel).  While  some  regions  of  the  0.5 
mm  and  the  0.25  mm  plots  are  relatively  sharp,  other  regions 
are  entirely  washed  out  with  almost  no  way  to  tell  what  the 
signal  actually  is.  One  can  increase  the  number  of  protons, 
but  this  will  increase  the  dose  to  the  patient,  which  should  be 
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Fig.  3  -  Distribution  in  WEPL  for  pixels  described 
by  the  coordinates  3(a)  (v  =  29,  t  =  103)  and  3(b) 
(v  =  61,  £  =  59)  before  cuts  are  made.  The  black 
line  defines  the  mode  of  the  distribution  and  the  red 
line  defines  the  mean  or  “peak”  of  the  distribution.  The 
blue  lines  indicate  the  mode  window  which  contains  the 
particles  within  ±30%  of  the  mode,  and  provides  the 
distribution  on  which  the  3 a  cuts  are  based.  The  green 
lines  indicate  the  cuts  made  on  this  specific  pixel.  Notice 
the  straggling  in  the  large  WEPL  range.  These  values 
correspond  to  particles  that  underwent  nuclear  interac¬ 
tions.  Fig  3(a)  illustrates  an  example  of  a  roughly  gaussian 
WEPL  distribution.  Fig.  3(b)  is  that  for  a  boundary  pixel 
with  a  bimodal  WEPL  distribution. 


(a)  1-mm  pixels 


(b)  0.5-mm  pixels 


Fig.  4  -  Image  profiles  for  1-mm,  0.5-mm  and  0.25- 
mm  pixels.  Profiles  show  that  as  pixel  size  is  decreased 
from  1-mm  (Fig.  4(a))  to  0.5-mm  (Fig.  4(b)),  the  spatial 
resolution  increases  (i.e.  the  details  become  more  clear). 
Further  reducing  the  pixel  size  seems  only  to  increase 
statistical  noise  in  the  image  (Fig.  4(c)).  An  ideal  pixel 
size  must  be  found  that  maximizes  spatial  resolution 
while  minimizing  dose  delivered  to  the  patient. 


kept  as  small  as  possible  due  to  the  small  risk  of  secondary 
cancer.  This  analysis  suggests  that,  for  a  given  dose,  there 
is  an  ideal  pixel  size  which  will  provide  a  balance  between 
spatial  resolution  and  dose.  We  have  found  that  at  least  20 
protons/pixel  are  required  for  reasonable  statistics. 


III.  Multiple  Coulomb  Scattering  and  Proton 
Scattering  Radiography 


The  amount  that  a  proton  is  scattered  between  its  entry  and 
exit  from  a  phantom  is  proportional  to  the  inverse  of  its  energy 
and  can  be  described  by  the  Lynch-Dahl  approximation  for 
multiple  scattering  events  [4]: 


9  = 


13.6eV 


0.038  log 


(2) 


where  9  is  the  width  of  the  Gaussian  approximation  for  angular 
deflection  in  a  plane,  (3,  p  are  the  velocity  and  momentum  of 


the  proton,  respectively,  z  is  the  charge  of  the  proton  and  x/Xa 
is  the  thickness  of  the  material  traversed  in  radiation  lengths, 
where  we  calculate  XQ  of  the  material  using: 


1  _  wi 
X0  ^  Xj 


(3) 


where  the  Wj ’s  are  the  fractions  by  weight  of  each  element  in 
a  given  material.  The  second  term  in  Eq.  2  tends  to  be  small 
and  can  thus  be  ignored  for  purposes  of  estimation.  Note  that 
this  approximation  is  good  only  for  relatively  thin  objects  (i.e. 
10"3  <  x/Xa  <  100)  where  the  energy  and  momentum  are 
assumed  to  be  approximately  constant.  For  a  thicker  phantom, 
we  must  account  for  energy-loss  by  introducing  an  integral 
over  x  (see  Ref.  [5]  for  details). 

A  scattering  radiograph  (scale  in  mrad)  is  given  in  Fig. 
5.  A  gaussian  distribution  of  scattering  angles  in  each  of 
the  t  (vertical)  and  v  (horizontal)  planes  in  each  pixel  was 
obtained.  The  mean  v  and  t  angles  were  determined  in  each 


Fig.  5  -  This  scattering 
radiograph  shows  a  strong 
agreement  between  pre¬ 
dicted  thickness  given  by 
Eq.  2  and  the  thickness 
of  real  materials.  Varia¬ 
tion  in  the  thickness  of 
the  hand  is  clearly  visi¬ 
ble.  Regions  of  dark  or¬ 
ange  and  black  are  those 
corresponding  to  thick  re¬ 
gions  of  bone.  Blue  region 
in  the  background  corre¬ 
sponds  to  the  scattering 
due  to  SSD’s  alone.  Scale 
is  in  mrad. 


pixel  from  these  distributions.  These  mean  angles  were  added 
in  quadrature  in  order  to  obtain  the  mean  spatial  scattering 
angle,  defined  as  the  angle  of  scattering  from  the  beam  axis. 
Areas  of  high  scattering  power,  such  as  bone,  were  expected  to 
yield  greater  scattering  angles,  while  protons  scattered  only  by 
SSDs  were  expected  to  have  the  smallest  scattering  angle.  The 
scattering  angle  value  was  then  compared  with  the  expected 
scattering  estimated  using  Eq.  2. 


TABLE  I  -  Densities  and  radiation  lengths  of 
materials  commonly  encountered  in  pCT.  Data  for 
bone:  [6].  Data  for  tissue,  water  and  silicon:  [7] 


Material 

Density 

(g/cm3) 

Radiation  Fength, 
(g/cm2) 

bone 

1.45 

16.6 

tissue 

1.00 

38.2 

water 

1.00 

36.1 

silicon 

2.33 

21.8 

Table  I  provides  radiation  length  values  for  material  that 
we  typically  deal  with  in  medical  proton  imaging.  For  a  200 
MeV  proton,  /3  =  .566  and  p  =  644  MeV/c,  and  therefore,  by 
Eq.  2,  the  scattering  due  to  the  four  silicon  tracker  plates  (1.6 
mm  total  thickness)  is  expected  to  be  approximately  5.2  mrad. 
Comparing  this  estimate  with  the  background  (blue)  region  in 
Fig.  5,  we  find  that  this  estimate  agrees  well  with  the  image, 
which  depicts  scattering  of  5-6  mrad  due  to  the  SSD’s  alone. 

While  the  spatial  resolution  of  the  scattering  radiograph  is 
not  as  good  as  with  the  energy-loss  radiograph,  one  can  still 
observe  regions  of  varying  thickness  around  the  edges  of  the 
fingers,  where  the  protons  traversed  only  skin  and  soft  tissue 
(yellow  and  green  region),  and  in  the  hand,  where  the  thickest 
bone  exists  (black  region).  The  scattering  angles  correspond 
to  realistic  proton  path  lengths  through  the  hand. 

A  remarkable  aspect  of  scattering  radiography  is  that  the 
contrast  between  bone  and  soft  tissue  for  proton  scattering 
power  is,  in  principle,  higher  than  that  of  proton  stopping 


Fig.  6  -  Normalizing  the  scattering  radiograph  (solid 
curve)  to  the  energy-loss  radiograph  (dashed  curve),  we 
see  roughly  the  same  shape  and  even  some  subtle  features, 
however  these  are  quite  a  bit  washed  out.  The  profile 
slopes  of  the  scattering  radiograph  in  the  bottom  plots 
are  shallower,  indicating  reduced  spatial  resolution. 


power.  The  stopping  power  of  bone  is  50%  -  80%  greater 
than  that  of  water,  but  the  scattering  power  of  bone  is 
about  2.5  times  that  of  water.  Fig.  6  compares  two  image 
profiles  for  the  energy-loss  radiograph  (dashed  curve)  and 
the  scattering  radiograph  (solid  curve).  When  the  scattering 
curve  is  normalized  to  the  energy-loss  curve,  we  find  that 
the  general  shapes  of  the  two  curves  of  each  plot  are  almost 
identical,  which  shows  that  in  this  case,  regions  of  greater 
stopping  power  are  also  regions  of  higher  scattering  power. 
The  energy-loss  curve  clearly  provides  higher  spatial  resolu¬ 
tion,  but  more  importantly,  it  provides  the  RSP  information 
required  for  treatment  planning.  The  scattering  radiograph, 
however,  may  provide  us  with  higher  contrast  resolution,  since 
contrast  depends  upon  the  difference  in  material  properties  of 
those  materials  being  imaged.  Information  about  the  radiation 
length  of  the  material,  XQ  can  be  gleaned  from  the  scattering 
radiograph  and  can  provide  us  with  the  the  effective  atomic 
number  of  the  material,  Z  (which  is  inversely  proportional 
to  the  radiation  length).  The  quality  and  usefulness  of  this 
information,  however,  requires  further  investigation. 

IV.  Conclusion 

Our  proton  radiographs  demonstrate  the  new  promise  of 
proton  imaging  (proton  radiography  and  CT)  now  within  reach 
of  becoming  a  new,  potentially  low-dose  medical  imaging 
modality.  This  work  indicates  that  choosing  an  optimal  pixel 
size  is  important  for  balanced  image  quality  in  terms  of  low- 
contrast  and  spatial  resolution.  The  image  profile  comparison 
suggests  that  scattering  radiography  may  yield  sharper  edges 


(greater  contrast)  between  soft  and  bone  tissue  than  energy 
loss  radiography,  alone.  However,  this  requires  further  study. 
Scattering  radiography  (like  x-ray  radiography)  does  provide 
information  about  the  radiation  length  of  materials  which  is 
inversely  proportional  to  the  effective  atomic  number  distribu¬ 
tion  in  the  tissue.  Energy-loss  radiography  cannot  provide  this 
information  since  stopping  power  depends  only  on  Z/A  which 
is  practically  identical  for  most  soft  tissues  and  water,  leading 
to  very  low  contrast.  Therefore,  scattering  radiography  will 
likely  have  useful  applications  in  proton  treatment  planning. 
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1  Scope 

The  purpose  of  this  document  is  to  describes  the  scientific  background  of  a 
multi-institutional  project  on  intensity  modulated  proton  therapy,  including 
mathematical  formulations  pertinent  references  relevant  to  this  project. 


2  Background 

2.1  Principles  of  IMpRT 

Intensity  modulated  proton  radiation  therapy  and  radiosurgery,  short  IMpRT 
and  IMpRS,  are  evolving  techniques  for  highly  conformal  dose  delivery  to 
tumor  or  other  targets  in  close  proximity  to  sensitive  and  critical  organs  at 
risk.  IMpRT  is  delivered  in  several  dose  fractions,  while  IMpRS  is  delivered 
in  as  a  single  dose  or  a  few  (up  to  5)  dose  fractions  applying  stereotactic 
techniques.  The  underlying  principle  of  these  techniques  is  to  aim  at  the 
target  from  many  different  directions  (either  in  2D  or  3D)  with  multiple 
narrow  proton  beams,  or  pencil  beams,  and  to  modulate  the  intensity  (or 
fluence)  of  each  beam,  taking  into  account  whether  they  pass  through  critical 
organs  at  risk  or  not.  The  most  important  characteristic  of  a  proton  beam  is 
that  it  delivers  a  low  dose  in  the  initial  part  of  the  beam  followed  by  a  rapid 
increase  of  dose,  leading  to  a  dose  peak  (the  Bragg  peak)  and  a  rapid  distal 
dose  fall-off  to  zero  dose  behind  the  Bragg  peak.  The  Bragg  peak  is  placed 
inside  the  target  at  a  given  beam  aiming  point.  Note  that  several  pencil 
beams  sharing  the  same  central  axis  can  be  ’’stacked”  in  beam  direction,  and 
this  arrangement  may  be  called  a  beamlet. 

The  starting  point  of  each  IMpRT /RS  calculation  is  a  digital  model  of  the 
patient  volume  of  interest,  e.g.,  the  patient’s  head,  usually  provided  by  a  com¬ 
puted  tomography  (CT)  scan.  A  head  CT  scan  consists  of  about  200  slices 
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of  1-2  mm  thickness  and  each  slice  is  organized  into  a  matrix  of  512  x  512 
image  pixels.  In  3D,  this  creates  a  digital  space  comprised  of  the  order  of 
50  million  voxels.  Each  voxel  has  material  properties  that  are  needed  to 
calculate  the  proton  dose  delivered  by  the  different  proton  pencil  beams. 

In  practical  applications,  one  generates  a  generic  pencil  beam  dose  model 
for  a  unit-intensity  proton  beam  in  water  and  scales  the  distance  between  the 
entry  point  of  a  proton  beam  into  the  object  and  the  beam  aiming  point  by 
multiplying  the  intersection  length  of  each  voxel  with  the  so-called  relative 
stopping  power  (RSP)  with  respect  to  water.  This  information  is  provided 
by  converting  the  numbers  provided  by  the  CT  scan  (Hounsfield  units)  to 
RSP,  using  a  HU-to-RSP  calibration  curve.  In  the  future,  the  RSP  of  voxels 
will  be  directly  reconstructed  from  a  proton  CT  (pCT)  scan.  Knowing  the 
central  beam  axis  dose  as  a  function  of  depth  in  water,  one  can  then  assign 
the  correct  dose  of  the  unit-intensity  proton  pencil  beam  to  each  voxel  on 
the  central  beam  axis.  Similarly,  knowing  the  lateral  dose  fall-off  at  each 
depth,  one  can  calculate  the  correct  dose  for  each  off-axis  voxel  based  on  its 
orthogonal  distance  from  the  beam  axis. 

Given  a  distribution  of  the  intensities  of  in  the  limit,  continuously  spaced 
proton  pencil  beams  directed  at  the  target,  one  can  calculate  the  resulting 
dose  distribution  in  the  voxels  of  the  object  using  a  proton  dose  operator  D 
that  mathematically  connects  the  two  quantities.  Often  times,  the  chosen 
intensities  do  not  result  in  a  satisfactory  dose  distribution,  i.e.,  one  that 
meets  the  dose  constraints  dictated  by  the  radiosensitivity  of  the  tumor  and 
the  organs  at  risk.  In  general,  one  wants  the  target  dose  to  exceed  some 
minimum  value  and  the  dose  in  organs  at  risk  not  to  exceed  a  maximum  value 
that  can  lead  to  serious  complications.  Therefore,  it  is  better  to  ”  prescribe” 
a  dose  distribution  selected  from  a  subset  in  a  continuum  of  possible  dose 
distributions  that  meet  the  clinical  requirements  and  then  to  find  a  ffuence 
distribution  that  that  will  lead  to  a  dose  distribution  that  is  a  member  of  this 
” solution”  subset.  As  we  will  see  below,  the  solution  of  such  an  ’'inverse” 
treatment  planning  problem  can  be  found  mathematically  by  formulating  a 
discrete  mathematical  model  of  IMpRT  that  can  be  solved,  in  principle. 

2.2  The  discrete  model  of  IMpRT 

In  the  absence  of  a  closed-form  analytic  representation  of  the  proton  dose 
operator  D  that  calculates  the  dose  distribution  given  a  the  fluence  of  an 
continuum  of  proton  pencil  beams,  and,  therefore,  the  absence  of  such  a 
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Figure  1:  Two  IMpRT  beams  from  different  directions.  Variable  shades  of 
gray  correspond  to  different  fluences  (number  of  protons  per  area).  Note 
that  each  square  in  the  beam  cross  section  can  be  occupied  by  more  than 
one  proton  pencil  beam,  making  up  a  beamlet,  each  with  a  different  Bragg 
peak  depth  and  intensity. 


presentation  of  its  inverse  operator  £)_1,  one  must  resort  to  a  fully-discretized 
model  of  the  problem.  The  term  full  in  “fully-discretized  model”  refers  to  the 
fact  that  both  the  external  proton  radiation  held  and  the  patient  volume  are 
discretized,  leading  to  a  problem  formulated  in  a  finite-dimensional  vector 
space.  To  do  this  we  divide  the  beam’s  cross-section  into  a  finite  rectangular 
grid  of  squares  and  the  beam  angles  into  discrete  angular  steps  separated  by  a 
constant  interval,  which  may  be  chosen  differently  for  each  IMpRT  treatment 
plan  (see  Figure  1).  Further,  we  discretize  the  proton  energy  into  steps,  such 
that  the  proton  Bragg  peaks,  i.e.,  the  dose  maximum  of  a  proton  pencil  beam, 
are  located  at  well-defined  discrete  aiming  points  within  the  patient  volume. 
Each  proton  pencil  beam  is  thus  assigned  a  discrete  direction  and  a  discrete 
energy. 
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Figure  2:  Example  of  a  CT  head  section  before  (left)  and  after  conversion  to 
a  color-coded  image  that  gives  each  voxel  a  tissue  assignment  (right). 


Figure  2  (left)  shows  a  representative  two-dimensional  (2D)  cross-section 
through  the  object.  In  a  contiguous  set  of  cross-sections,  the  treatment 
planner  defines  a  set  of  voxels  that  belong  to  the  target.  Other  voxels  sets 
may  be  defined  that  are  assigned  to  an  organ  at  risk,  e.g.,  the  brainstem,  or 
other  normal  tissue  regions,  such  as  brain  and  skull  bone.  In  order  to  simplify 
the  image  segmentation  process  and  to  calculate  the  dose  of  unit-intensity 
beams,  each  image  of  the  CT  data  set  needs  to  be  processed  in  order  to  assign 
a  given  tissue  type  to  each  voxel  based  on  the  CT  (HU  or  RSP)  value.  This 
is  shown  in  Figure  2  (right). 

2.3  Mathematical  formulation  of  the  discrete  IMpRT 
model 

The  patient  volume  D  is  divided  into  a  discrete  grid  of  voxels  the  centers 
of  which  are  the  desired  dose  calculation  points.  These  are  represented  by 
the  family  of  triplets  of  3D  coordinates  {(r^)  |  j  =  1,2,...,  J}.  Further,  we 
define  a  discrete  number  of  proton  pencil  beams  by  their  entry  direction  unit 
vectors  {uj  |  i  —  1,  2, . . . ,  /}.  and  aiming  point  {(r*)  |  i  —  1,  2, . . . ,  /}. 

Let  dij  be  the  dose  deposited  at  the  jth  grid  point  ( rj )  in  the  patient 
volume  D  due  to  the  ith  pencil  beam  (f  j,  Vj)  of  unit  proton  fluence  and  define 
the  /-dimensional  vector  aJ  =  (alj)Il=l  for  j  =  1,2,...,  J.  Let  Xi  denote 
the  actual  (yet  unknown)  fluence  of  the  ith  pencil  beam  (fj,Uj)  and  define 
the  /-dimensional  vector  x  =  ( Xi){=l  which  is  unknown  vector  of  all  pencil 
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beams’  fluences  that  should  deliver  the  required  dose  to  the  patient  volume  hi. 
Finally,  let  dj  and  dj  be  an  upper-bound  and  a  lower-bound,  on  the  permitted 
or  required,  respectively,  dose  in  the  jth  grid  point  (rq)  in  the  patient  volume 
Q. 

With  these  notions  we  can  define  discrete  forward  and  inverse  problems  of 
IMpRT  as  follows. 

The  discrete  forward  problem  of  IMpRT:  Given  a  patient  volume  fl, 
whose  physical  properties  are  known,  and  a  discretized  (into  I  proton  pencil 
beams)  external  proton  radiation  field  {(/*,  vf)  |  ?'  =  1,2,... , /},  along  with 
a  proton  pencil  beams  intensity  vector  x,  find  the  discretized  proton  dose 
distribution  function  D(rj )  for  all  (rj)  G  Q. 

This  discrete  forward  problem  can  be  solved  if  all  /-dimensional  vectors 
a]  =  (alj)Il=l  for  j  =  1,2, ...  ,J,  are  known  to  us,  e.g.,  by  having  been  pre¬ 
calculated  by  a  forward  problem  solver  computer  package.  In  that  case, 
denoting  dj  =  D(rj,  dj)  for  all  j  =  1,  2, . . . ,  J,  we  just  need  to  calculate 

i 

^  '  Q'ijZ'i  dj,  j  1,2....,,/.  (1) 

i= 1 

The  J-dimensional  vector  d  =  (dj)j=1,  whose  components  are  the  discretized 
proton  dose  distribution  function  D(rj)  values,  is  called  a  dose  vector. 

The  discrete  inverse  problem  of  IMpRT:  Given  are  a  patient  volume 
fl,  whose  physical  properties  are  known,  an  upper-bound  dose  vector  d  = 
( dj)j=1  and  a  lower-bound  dose  vector  d  =  ( dj)j=l ,  on  the  permitted  and 
required,  respectively,  doses  at  the  grid  points  {(rj,  9j )  |  j  =  1,  2, . . . ,  J}  in 
the  patient  volume  fh  Find  a  proton  pencil  beams  fluence  vector  x  such  that 

i 

dj  <  dijXi  <  dj,  for  all  j  =  1,  2, . . . ,  J  and  aq  >  0  for  all  /  =  1,2,...,  /. 

i=l 

(2) 

This  formulation  of  the  discrete  inverse  problem  of  IMpRT  does  not  aim 
at  a  proton  pencil  beams  fluence  vector  x  that  will  deposit  a  fixed  prescribed 
dose  in  each  voxel  but  rather  calls  for  a  solution  of  that  is  called  in  optimiza¬ 
tion  theory  the  solution  of  a  linear  feasibility  problem.  The  term  “feasibility” 
refers  here  to  the  fact  that  no  exogeneous  objective  function  is  set  up  for  opti¬ 
mization  but  rather  any  point  in  the  feasible  set  {x  G  R1  \  dj  <  a^Xi  < 
dj,  for  all  j  =  1,2, . . . ,  J}  will  be  “acceptable”  by  the  treatment  planner. 
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This  feasibility  approach  to  setting  up  the  discrete  inverse  problem  has  its 
roots  in  some  early  papers  on  radiation  therapy  treatment  planning  where 
the  term  IMRT  was  even  not  used,  see  [1,  5,  6,  7]. 

The  J  individual  linear  feasibility  constraints  of  (2)  can  be  grouped  ac¬ 
cording  to  volumes  of  interest  in  the  patient  volume  0. 


3  Scientific  Tasks 

The  graduate  students  will  support  the  development  of  the  Geant4  beam 
libraries  (Tai)  of  a  GPU-based  platform  for  testing  new  algorithms  (Aarohi) 
that  solve  the  discrete  inverse  problem  of  IMpRT.  A  brief  summary  and 
motivation  of  each  task  is  provided  below. 

3.1  Identification  and  Storage  of  Volumes  of  Interest 

The  starting  point  for  IMpRT  calculations  is  a  CT  image  set,  as  described 
in  the  background  section.  The  images  are  in  DICOM  format,  which  is  a 
standardized  medical  imaging  format.  Within  this  image  set,  the  physician 
defines  the  boundaries  of  volumes  of  interest  (VOIs)  in  pertinent  slices.  This 
task  is  usually  performed  with  a  commercial  computer  treatment  planning 
program.  The  program  provides  the  tools  to  draw  the  VOI  regions  in  indi¬ 
vidual  slices  and  to  display  them  as  overlay  on  the  original  CT  images.  The 
program  also  outputs  a  standardized  DICOM  RT  structure  set  that  contains 
the  geometrical  information  of  the  VOI  boundaries. 

The  students  will  import  the  DICOM  image  data  as  well  as  the  DICOM  RT 
structure  set  hie  into  a  Matlab  program.  Matlab  interprets  the  image  set  as 
a  hypermatrix  of  512  x  512  matrices  that  contain  the  numerical  voxel  values 
(in  HU)  as  elements.  The  students  need  to  develop  software  that  stores  the 
information  of  which  voxel  indices  belong  to  each  VOI  in  condensed  sparse 
row  format.  This  information  will  later  be  needed  to  assign  the  individual  lin¬ 
ear  feasibility  constraints  to  the  correct  voxels  according  to  their  assignment 
to  VOIs. 

3.2  CT  Image  Segmentation 

For  the  forward  dose  calculation,  it  is  necessary  to  assign  different  regions  in 
the  CT  images  to  different  materials,  in  this  case  to  different  human  tissues. 
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The  simplest  way  to  do  this  is  to  define  HU  intervals  and  assign  them  to 
a  specific  tissue,  as  shown  in  Table  1,  which  is  the  conversion  table  for  a 
pediatric  head  phantom  with  9  different  tissue  types.  However,  as  can  be 
seen  in  Figure  2,  this  assignment  is  not  always  perfect  due  to  the  presence  of 
noise  and  artifacts  in  the  CT  images. 

The  students  will  develop  a  program  that  finds  the  boundaries  between 
different  tissue  regions  and  will  assign  voxels  inside  these  boundaries  to  the 
correct  materials.  The  voxel  volumes  are  generally  small  enough  to  ignore 
partial  volume  effects,  i.e. ,  individual  voxels  will  be  assigned  only  one  material 
type. 


Table  1:  Tissue  categorization  according  to  HU  value. 
HU  Interval  Tissue 


[-1000,-800) 
[-800,  -700) 
[—700, 40) 

[40,  90) 

[90, 150) 
[150,200) 

[200, 1000) 
[1000,2000) 

>  2000 


air 

sinus 

soft  tissue 
brain 
spinal  disc 
trabecular  bone 
cortical  bone 
tooth  dentin 
tooth  enamel 


3.3  Interface  to  Geant4  Program  Output 

Geant4  is  a  toolkit  written  in  C++  code  that  performs  radiation  transport 
calculations.  The  students  will  obtain  a  source  model  for  the  Geant4  forward 
dose  calculations.  Geant4  will  provide  a  dose  model  for  a  standard  library  of 
proton  pencil  beams  in  water  with  energies  between  60  MeV  and  160  MeV  in 
10  MeV  steps.  The  students  will  also  develop  a  program  that  creates  an  array 
of  beaming  aiming  points  for  each  of  a  set  of  beam  directions.  The  program 
will  then  calculate  the  water  equivalent  depth  of  each  point  by  multiplying 
beam  axis  intersection  lengths  by  the  assigned  relative  stopping  power  (RSP) 
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for  each  voxel  on  the  central  beam  path.  In  addition  the  water-equivalent 
distance  of  voxels  lateral  to  the  central  beam  axis  will  need  to  be  calculated. 


4  Potential  for  Publications 

The  development  of  the  beam  libraries  and  computing  platform  will  be  re¬ 
ported  by  the  students  at  scientific  meetings  in  computer  science  and  medical 
physics  fields.  This  will  typically  lead  to  abstracts  and  conference  papers  with 
the  students  being  the  first  author  (depending  on  the  type  of  conference). 
The  aim  is  to  also  publish  a  series  of  original  papers  with  on  solution  algo¬ 
rithms  developed  by  Ran  and  Yair  with  postdoc  Ran  Davidi  as  first  author 
and  students  as  co-authors.  There  could  well  be  other  original  papers  written 
by  students  on  spin-off  projects  resulting  from  the  main  project. 
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Introduction  and  Objective 

Computationally  demanding  numerical  minimization  techniques 
are  often  used  in  IMRT  treatment  planning,  but  the  commonly 
employed  cost  functions  and  corresponding  solution 
approaches  are  not  necessarily  the  most  appropriate  for 
achieving  the  desired  dose  behavior.  This  disconnect  occurs 
because  minimal  solutions  to  current  cost  function  formulations 
are  not  guaranteed  to  provide  the  necessary  dose  coverage, 
conformality,  or  homogeneity.  Therefore,  the  considerable 
computational  cost  associated  with  some  of  these  minimization 
techniques  may  not  be  justified. 

We  propose  a  novel  superiorization  approach  that  substantially 
improves  computational  tractability  by  producing  a  solution  with 
reduced,  but  not  necessarily  minimal,  value  of  the  defined  cost 
function  that  is  guaranteed  to  satisfy  the  given  IMRT  planning 
constrains.  Superiorization  is  a  new  paradigm  that  can  be 
viewed  as  lying  in-between  feasibility-seeking  for  the  dose 
constraints  and  full-fledged  constrained  minimization  of  the  cost 
function  subject  to  these  constraints.  This  method  is  based  on 
the  discovery  that  many  feasibility-seeking  algorithms  are 
perturbation-resilient,  and  superiorization  proactively  steers  the 
feasibility-seeking  projection  method  towards  a  feasible  solution 
of  the  dose  constraints  with  a  reduced,  but  not  necessarily 
minimal,  cost  function  value. 

The  superiorization  method  produces  "superior  feasible 
solutions"  and  can  replace  current  IMRT  constrained 
minimization  methods,  potentially  leading  to  shorter 
computational  times  and  improved  dose  distributions. 

Materials  and  Methods 

We  model  a  given  IMRT  problem  as  a  linear  feasibility  one,  by 
formulating  the  constraints  into  upper-  and  lower-bounds 
vectors.  The  bounds  are  set  and  depend  whether  the 
constrained  volume  is  a  target  or  an  organ  at  risk  (OAR).  The 
bounds  reflect  the  dose  acceptance  criteria,  which  are 
determined  by  the  treating  physician  and  reflect  generally 
accepted  dose  guidelines.  A  projection  method  that  is 
perturbation-resilent  aims  at  solving  this  linear  feasibility  system 
of  hyperslabs  constraints.  This  feasibility-seeking  algorithm 
uses  the  resiliency  to  perturbations  to  steer  the  iterates  to  a 
superior  feasible  point  with  respect  to  an  objective  function. 
Here  we  use  ART  for  inequality  constraints  and  total  variation 
(TV)  [2]  of  the  beam  intensity  space  as  the  objective  function. 


The  complete  superiorization  algorithm  is  provided  in  the  pseudocode  (Fig.  1)  and  is  based 
on  [2]. 

How  superiorization  works:  The  algorithm  starts  from  an  arbitrary  point.  In  lines  7-17  it 
perturbs  the  current  point  N  times.  A  nonascending  vectoris  computed  in  line  8  and  the 
perturbation  is  performed  in  line  13  with  some  step  size  (3.  The  value  of  the  objective  function 
O  is  assessed  in  line  14  to  make  sure  that  the  perturbation  superiorized  (obtained  a  lower 
value)  the  objective  function  compared  to  the  previous  point.  At  the  end  of  the  N  perturbation 
steps,  the  projection  method  is  applied  and  a  new  point  is  obtained.  The  process  repeats  until 
the  acceptance  dose  criteria  is  met. 

IMRT  plan:  The  anonymized  pelvic  planning  CT  of  a  prostate  cancer  patient  was  employed 
for  the  IMRT  treatment  planning  using  the  proposed  method.  Seven  equispaced  fields  were 
used  for  targeting  the  PTV.  The  dose  constraints  were  set  using  the  RTOG  0815  randomized 
trial  protocol  [3]. 

Results 

We  have  initially  tested  this  new  approach  by  comparing  the  TV-superiorization  algorithm 
with  an  otherwise  identical  algorithm  that  aimed  at  only  satisfying  the  dose  constraints 
without  applying  superiorization.  We  performed  two  experiments  with  different  starting 
conditions.  For  the  first  experiment,  we  started  the  algorithm  with  the  zero  vector  of  dose 
weights  and  for  the  second  experiment  all  dose  weights  were  given  the  value  10.  Table  1 
summarizes  the  results  for  the  two  experiments  and  in  Fig.  2  we  present  the  associated  DVH 
curves.  For  the  first  experiment,  the  TV-superiorization  produced  a  solution  that  met  the 
acceptance  criteria  after  12  iterations  whereas  the  conventional  algorithm  was  not  able  to 
reach  an  acceptable  solution  after  this  number  of  iterations.  For  the  second  experiment,  the 
superiorization  algorithm  reached  an  acceptable  solution  even  faster,  i.e.,  after  7  iterations, 
and  the  conventional  algorithm  again  failed  some  of  the  acceptance  criteria  after  this  number 
of  iterations. 


Table  1 :  RTOG  0815  acceptance  criteria  and  results  of  the  two  experiments  described  in  the  Results  section 


Acceptance  criteria 

Exp  1  with 
superiorization 

Exp  1  without 
superiorization 

Exp  2  with 
superiorization 

Exp  2  without 
superiorization 

PTV  min  allowed  dose  (95%  of 
prescribed  dose)  is  75.24  Gy 

75.24  Gy 

56.13  Gy 

77.80  Gy 

76.15  Gy 

PTV  max  allowed  dose:  84.74  Gy 

84.69  Gy 

89.42  Gy 

84.71  Gy 

87.63  Gy 

Rectum  -  No  more  than 

50%  volume  receives  dose 
that  exceeds  60.00  Gy 

34.50  % 

8.50  % 

36.90  % 

40.50  % 

Rectum  -  max  dose 

82.64  Gy 

82.71  Gy 

84.09  Gy 

87.25  Gy 

Conclusions 


Our  proposed  method  successfully  produced  conformal  solutions  that  met  the  acceptance 
criteria  while  that  an  otherwise  identical  algorithm  without  superiorization  failed  to  do  so  with 
the  same  number  of  iterations.  Future  work  will  assess  the  computational  gain  of  the 
superiorization  method  compared  to  a  conventional  one  and  investigate  the  utility  of  it  for  a 
computationally  more  complex  problems  such  as  Volumetric  Modulated  Arc  Therapy  (VMAT). 
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Fig.  1 :  Pseudocode  of  the  Superiorization  Algorithm. 


1.  set  k  =  0 

2.  set  yk  =  y° 

3.  set  (  =  —1 

4.  repeat 

5.  set  n  =  0 

6.  set  yk  n  =  yk 

7.  while  n<N 


9. 

10. 

11. 

12. 

13. 

14. 


16. 

17. 

18. 
19. 


set  vk,n  to  be  a  nonascending  vector  for  6  at  yk,n 
set  loop— true 
while  loop 

set  (  =  t  +  1 

set  fik,n  —  *11 

set  z  =  yk  n  +  0k,nVk  n 

if  d>{z)<4>  (yk)  then 

Set  71  —  71  +  1 

set  yk’n=z 
set  loop  =  false 
set  yk+[  =Ac  (yk'N) 
set  k  =  fc  +  1 


Fig.  2:  Dose  Volume  Histograms  (DVH)  of  the  two  experiments.  Solid 
lines  represent  the  algorithm  with  TV-superiorization  (broken  lines 
represent  no  superiorization).  The  first  (top)  took  12  iterations  and  the 
second  (bottom)  took  7  iterations.  Exact  numbers  are  given  in  Table  1. 
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are  often  used  in  medical  applications  such  as  radiation  therapy  treatment 
planning  and  computerized  tomography.  They  often  employ  cost  functions 
and  corresponding  solution  approaches  that  are  not  necessarily  most 
appropriate  for  achieving  the  desired  solutions.  This  disconnect  occurs 
because  minimal  solutions  to  current  cost  function  formulations  are  not 
guaranteed  to  provide  the  optimal  solution  from  the  point  of  view  of  the 
application.  Therefore,  the  considerable  computational  cost  associated  with 
some  of  these  minimization  techniques  may  not  be  justified.  Superiorization 
is  a  new  paradigm  that  substantially  improves  computational  tractability  by 
producing  a  solution  with  reduced,  but  not  necessarily  minimal,  value  of  a 
defined  cost  function  that  is  guaranteed  to  satisfy  the  constraints  of  the 
problem.  The  ability  to  do  so  stems  from  the  fact  that  many  feasibility¬ 
seeking  projection  methods  are  perturbation-resilient  which  enables  to  steer 
the  process  to  a  solution  with  a  reduced  (i.e.,  superior)  cost  function  value.  In 
this  talk  we  present  how  superiorization  can  be  applied  to  real-world 
applications  and  demonstrate  its  usefulness  with  a  few  examples  taken  from 
the  medical  field. 


Title:  projection-based  scheme  for  solving  convex  constrained  optimization 

problems 

Speaker:  Aviv  Gibali,  Fraunhofer  Institute  for  Industrial  Mathematics 
(ITWM),  Kaiserslautern,  Germany 

Abstract:  In  this  talk  we  present  a  new  projection-based  scheme  for  general 
convex  constrained  optimization  problem.  The  general  idea  is  to  transform  the 
original  optimization  problem  to  a  sequence  of  feasibility  problems  by 
iteratively  constraining  the  objective  function  from  above  until  the  feasibility 
problem  is  inconsistent.  Then,  for  each  of  the  feasibility  problems  one  may 
apply  any  of  the  existing  projection  methods  for  solving  it,  which  are  known 
to  be  very  efficient  and  practical.  Some  numerical  experiments  to  illustrate  the 
performance  of  the  suggested  scheme. 
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Title:  The  cyclic  Douglas-Rachford  algorithm 

Speaker:  Rafiq  Mansour,  University  of  Haifa,  Israel 

Abstract:  The  Douglas-Rachford  (DR)  algorithm  is  a  projection  method  for 
finding  the  projection  of  a  point  onto  the  nonempty  intersection  of  two  sets.  It 
draws  great  attention  in  the  literature  recently.  We  review  recent  results  on  the 
cyclic  Douglas-Rachford  algorithm  which  extends  the  DR  algorithm  to  handle 
a  family  of  n  sets.  Our  presentation  is  based  on  a  recent  paper  on  this  topic  by 
J.M.  Borwein  and  M.K.  Tam. 


Title:  Porosity  and  the  bounded  linear  regularity  property 

Speaker:  Simeon  Reich,  The  Technion,  Israel 

Abstract:  H.  H.  Bauschke  and  J.  M.  Borwein  showed  that  in  the  space  of  all 
tuples  of  bounded,  closed  and  convex  subsets  of  a  Hilbert  space  with  a 
nonempty  intersection,  a  typical  tuple  has  the  bounded  linear  regularity 
property.  This  property  is  important  because  it  leads  to  the  convergence  of 
infinite  products  of  the  corresponding  nearest  point  projections  to  a  point  in 
the  intersection.  We  show  that  the  subset  of  all  tuples  possessing  the  bounded 
linear  regularity  property  has  a  porous  complement.  Moreover,  our  result  is 
established  in  all  normed  spaces  and  for  tuples  of  closed  and  convex  sets 
which  are  not  necessarily  bounded. 

This  is  joint  work  with  A.  J.  Zaslavski. 
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